Understand why a new index was rolled out

It is not clear to me why the new index was rolled out.
Here is my policy:

{
    "policy": {
        "policy_id": "standard_policy",
        "description": "Default policy",
        "last_updated_time": 1585640752093,
        "schema_version": 1,
        "error_notification": null,
        "default_state": "rollover",
        "states": [
            {"name": "rollover",
                "actions": [
                    {"retry": {
                            "count": 20,
                            "backoff": "constant",
                            "delay": "1h"},
                        "rollover": {
                            "min_size": "25gb",
                            "min_index_age": "7d"}
                    }
                ],
                "transitions": [
                    {"state_name": "search",
                        "conditions": { "min_index_age": "7d" } },
                    {"state_name": "search",
                        "conditions": { "min_size": "23gb"} }
                ]
            },
            {"name": "search",
                "actions": [
                    { "timeout": "24h",
                        "retry": { "count": 5,
                            "backoff": "constant",
                            "delay": "1h" },
                        "force_merge": {"max_num_segments": 1}}
                ],
                "transitions": [{"state_name": "delete",
                        "conditions": {"min_index_age": "30d"}}]
            },
            { "name": "delete",
                "actions": [
                    {"retry": { "count": 20,
                            "backoff": "constant",
                            "delay": "1h"},
                        "delete": {} }
                ],
                "transitions": []
            }
        ]
    } }

So, if I understand correctly the new index should be rolled out only in case of “min_size”: “25gb” or “min_index_age”: “7d”.
But here is what I have right now:

health status index                                                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   aws-logs-000004                                         wTdOSjH-T7WKDk_gg8Xaxw   3   1   28116489       280742       20gb           10gb
green  open   aws-logs-000005                                         TeGlVKXRTj-DDzCq_NxwHA   3   1    9607375         1416      6.6gb          3.3gb

aws-logs-000004 has creation date: “creation_date” : “1585559545398”, which is Monday, 30 March 2020 г., 9:12:25.398. 2 days ago. And it also has a size much smaller than 25Gb.
So the question: is there any way to understand why aws-logs-000005 was rolled out at the time when aws-logs-000004 was not old enough and had size much smaller than a threshold.

Hi @andrii,

Could you check your logs for any errors?
Will look into this now, thanks.

I didn’t find any errors in the logs. But what is weird that for index aws-logs-000004 I see only initialization message in the ism-history:


So there are no rollover action related messages

May it be caused by retry failed index?
I had several indices in the failed state, unfortunately don’t remember which exactly. And I executed
POST _opendistro/_ism/retry/-00000

Hi @andrii,

Thanks for the info, will look into the retry API. Did you call it on all your indices or just ones that were retried? The audit history index not containing everything could be expected. We write a document into that index after the plugin does some work on behalf of the user and we do it on a best effort basic. If the indexing of the document fails all tries, we don’t fail the job. That being said, if the indexing did fail you would see an error in the logs.

Looking through source code it looks like it could be possible for the index creation_date comes back as -1, not sure how that happens, will have to look into it more. Will put out some fixes shortly for that case with more logs so it’s more apparent why transitions/rollover happens.

And if you could provide a it more information that would be great:

  1. How many nodes in your cluster
  2. How are you creating these indices
  3. How are you applying the policies

Hi @dbbaughe
I call retry API for couple indices: here is the exact POST:

POST _opendistro/_ism/retry/*-00000*

With regards to cluster questions:

  1. Cluster configuration: 3 master nodes, 3 Ingest nodes, 7 data nodes (4 hot, 3 warm)

  2. All indices were created by KafkaConnect connectors that read data from topics and push it into Elasticsearch

  3. All policies have been applied via a template. Here is an example of the index template:

      "settings": {
        "default_pipeline": "aws_routing_pipeline",
        "number_of_shards": 3,
        "number_of_replicas" : 1,
        "refresh_interval" : "60s",
        "index.mapping.ignore_malformed": "true",
        "index.unassigned.node_left.delayed_timeout": "10m",
        "index.routing.allocation.require.box_type": "hot",
        "opendistro.index_state_management.rollover_alias": "aws-alias",
        "opendistro.index_state_management.policy_id": "standard_policy"`

Thanks, will try to replicate. Once changes are in for the previous listed things will let you know and if possible you can try the updated plugin which should help if it ever happens again.

Thanks a lot @dbbaughe

Hi @andrii,

Could you also use the GET /_stats API and check what the size is?
We had an issue where user indexed only 1 document and it rolled over which had it set to 5 documents.
This was happening because internally it had a nested mapping type which ended up creating 10 internal documents from the 1 document which triggered the rollover. This appears to be the native default behavior of rollover too.

It’s possible your value of size from the _cat API is different from what we use from the stats API too.

Hi @dbbaughe
This seems to be not my case. I mean that I see the same numbers in the _cat and _stats APIs.
Here is GET _stats output for one of such indices:
"_all" : { "primaries" : { "docs" : { "count" : 28116489, "deleted" : 280769 }, "store" : { "size_in_bytes" : 10802690099 },
And here is _cat//indices output:
green open aws-logs-000004 wTdOSjH-T7WKDk_gg8Xaxw 3 1 28116489 280769 20.1gb 10gb

Hi @andrii,

Will merge this soon: https://github.com/opendistro-for-elasticsearch/index-management/pull/170
And will backport this to previous versions + release new individual ISM zip updates that you can update your distribution with. This will at least be explicit about why a transition or rollover occurs and log it along with fixing the -1L creation date if that was in fact the issue.

Definitely seems like knowing what values we are using for these condition checks would be useful to end user. Will look into including the condition values set by user and the actual values used in comparison in the “info” map for each index if it’s on one of these actions.

Thinking something like:

"info": {
    "message": "Attempting to rollover",
    "conditions": {
        "min_index_age": {
            "value": false
            "condition": "7d"
            "current": "6.35d"
        },
        "min_size": {
            "value": false
            "condition": "25gb"
            "current": "20.46gb"
        },
    }
}

Hi @dbbaughe,

It will be great to have such info. Thanks a lot.
Will this fix be available for opendistro 1.6.0? I had plans to migrate from 1.4.0 to 1.6.0 but can wait until a fix is released.

Hi @andrii ,

These will not be in the 1.6.0 release as that just happened yesterday/today I believe. But, the plugin itself will be updated for all versions with these changes which means depending on how you use Open Distro you can update the Index Management plugin with the newer version. i.e. if you use Docker you can uninstall Index Management 1.6.0.0 and install Index Management 1.6.0.1 yourself to get the changes now.

And to note, still not sure if this is the fix for your problem as it’s the only thing I could immediately see causing it. These changes do introduce a log containing all the values of the conditions whenever the rollover/transition evaluate to true though which should help figure out why if it ever does happen again.

The more descriptive condition messaging in the “info” will be a separate change later on as it’s more involved.

Hi @dbbaughe,

Thanks a lot for your help

Hello @dbbaughe
Do you have any ETA when ISM zip update will be available? I’m still facing this issue without any clue why it happens. So your changes will be very helpful

Hey @andrii,

The changes were made to log whenever rollover/transition happen with the values. There hasn’t been an official new release yet, but you can take the updated zip from any of the CI runs on GitHub.

You should see an “Artifact” from that link above which has all the latest built in to master. Let me know if that helps, otherwise I could build one for you and link it.

Hi @dbbaughe

Found it. will try to use it. thanks a lot.