This morning I discovered that I had a few cases of my Index Management policy failing. When I try to manually retry the policy (i.e. click on the RETRY POLICY button), I receive the following error message in a pop-up dialog.
Failed to retry: [kubernetes_cluster-kube-system-2020-03-27, RemoteTransportException[[v4m-es-data-2][10.254.5.253:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[v4m-es-data-2][10.254.5.253:9300][indices:data/write/bulk[s][p]]]; nested: EsRejectedExecutionException[rejected execution of processing of [3086976][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[.opendistro-ism-config][0]] containing [update {[.opendistro-ism-config][_doc][MJkR_h7-TaedLJIqWCuWGA], doc_as_upsert[false], doc[index {[.opendistro-ism-config][_doc][MJkR_h7-TaedLJIqWCuWGA], source[{"managed_index":{"last_updated_time":1585339823801,"enabled":true,"enabled_time":1585339823800}}]}], scripted_upsert[false], detect_noop[true]}], target allocation id: oBVL06_cTPKMGECPPq0dUQ, primary term: 4 on EsThreadPoolExecutor[name = v4m-es-data-2/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@44ed9f55[Running, pool size = 1, active threads = 1, queued tasks = 200, completed tasks = 1848255]]];]
I should also mention that Fluent Bit is experiencing problems communicating with Elasticsearch in this cluster as well. I see many messages in the Fluent Bit log about failed attempts to send data to ES although there are also many messages indicating Fluent Bit was eventually successful on the 2nd or 3rd attempt. I see no messages in the Elasticsearch logs that appear to correlate to the Fluent Bit activity. So, I’m not sure if the events are related or not. Coincidentally (or not?), things started going bad right around 8PM EDT which corresponds to midnight GMT/UTC…which I believe is when all of my date-based indexes “roll over”.
As things stand now, I don’t think the ES cluster is in a healthy state but I’d like to figure out why before I blindly redeploy everything the same way again. Any assistance is appreciated.