Problem with automatically adds policy

Hi.
I have a problem with automatic policy enforcement:
If create a new policy specifying “ism_template”
{
“policy”: {
“policy_id”: “autodel_7days”,
“description”: “delete index after 7 days”,
“default_state”: “norm”,
“states”: [
{
“name”: “norm”,
“actions”: ,
“transitions”: [
{
“state_name”: “del”,
“conditions”: {
“min_index_age”: “7d”
}
}
]
},
{
“name”: “del”,
“actions”: [
{
“delete”: {}
}
],
“transitions”:
}
],
“ism_template”: {
“index_patterns”: [
“testindex*”
],
“priority”: 50
}
}
}

after creating a new index, the policy is not applied:
PUT testindex1
GET _opendistro/_ism/explain/testindex1

{
“testindex1”: {
“index.opendistro.index_state_management.policy_id”: null
}
}

at the same time everything works as it should on the test cluster.
production cluster contains 19 policies that are applied to 4000 indexes. 2 weeks ago, after the transition to 1.13.1, all policies were updated and for a while they were assigned normally. but now only old policies (created 2 weeks ago) are assigned, all newly created policies behave as in the example above.
as if they do not perceive the “ism_template” block.

Any ideas what could have gone wrong?

for templates “_simulate_index” (POST _index_template/_simulate_index/testindex) allows you to check which template will be assigned to the new index, maybe there is a similar command for checking the policy?

Hi @mouse,

When you create an index that matches the index pattern do you see anything in the logs on the active master node? That is the one that should be parsing the cluster changed event and seeing there is a new index and trying to match it to a policy to create the managed index job.

@ dbbaughe
There are no error messages, I see only this:

[2021-03-31T04: 51: 07,094] [INFO] [c.a.o.j.s.JobSweeper] [vlg-elk-mst01] Running full sweep
[2021-03-31T04: 53: 42,526] [INFO] [oecmMetadataCreateIndexService] [vlg-elk-mst01] [testindex99] creating index, cause [api], templates [mf_default_ism_historyday], shards [1 ]/[one]
[2021-03-31T04: 53: 47,291] [INFO] [oecraAllocationService] [vlg-elk-mst01] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[testindex99] [0]]]).
[2021-03-31T04: 56: 07,094] [INFO] [c.a.o.j.s.JobSweeper] [vlg-elk-mst01] Running full sweep
[2021-03-31T04: 56: 21,179] [INFO] [oecmMetadataCreateIndexService] [vlg-elk-mst01] [elk_vault_audit_log-test-2021.03.30] creating index, cause [api], templates [mf_vault], shards [1 ]/[one]
[2021-03-31T04: 56: 21,773] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_vault_audit_log-test-2021.03.30] will be managed by policy [day_90_nodel]
[2021-03-31T04: 56: 24.929] [INFO] [oecraAllocationService] [vlg-elk-mst01] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[elk_vault_audit_log-test-2021.03 .30] [0]]]).

created two indices: testindex99 - falls under the pattern of the new policy (the message above), and elk_vault_audit_log-test-2021.03.30 - under the existing one.

As a result of the tests, it turned out that if I now update the “day_90_nodel” policy, it will also stop being assigned automatically

It was not easy, but I solved the problem.
I started to remove template assignment from all policies (set “ism_template”: null)
When there were only two politicians with templates left, they began to work out en masse. At the same time, I did not create new indexes. Log from the master:

[2021-03-31T07: 31: 07,107] [INFO] [c.a.o.j.s.JobSweeper] [vlg-elk-mst01] Running full sweep
[2021-03-31T07: 32: 43,259] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-utaginfra-perfmon-2021.03.30] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-test-2] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_dc-logon-test-99] will be managed by policy [autodel_7days]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [security-auditlog-2021.03.30] will be managed by policy [autodel_7days]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-2021.03.31] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-test-1] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-test-3] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-utaginfra-perfmon-2021.03.31] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-2021.03.30] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [security-auditlog-2021.03.31] will be managed by policy [autodel_7days]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_nfvi_ethernet-2021.03.31] will be managed by policy [autodel_7days]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_techresources-techxa-vda-perfmon-test-4] will be managed by policy [day_60-30]
[2021-03-31T07: 32: 43,260] [INFO] [c.a.o.i.i.ManagedIndexCoordinator] [vlg-elk-mst01] Index [elk_nfvi_ethernet-2021.03.30] will be managed by policy [autodel_7days]

Probably there were some problems with the intersection of patterns, but no errors appeared when saving policies, and all policies were manually assigned to the indexes without errors.

UPD:
I started restoring templates in policies anew, checking how it work after each and again stumbled upon the problem.
It occurs on exactly 11 policy containing index templates. At the same time, there is no difference what kind of index pattern in 11 policy, it simply does not work.
If I clear the pattern at 10 (set “ism_template”: null) then 11 works without problems.
The question arises: are there any limits on the number of policies containing index_patterns?

Is this a resource issue with your cluster? How many nodes on your production cluster?

@tony

my error is clearly reproduced on the production cluster (18 data nodes, 3 master, 6 ingest, 14 coordination. 12 cores CPU + 64 GB RAM for each) and on the test cluster (3 data nodes, 1 master, 2 ingest. 4 cores CPU +8 GB RAM for each) as follows:

The same policies are consistently created, those
“policy_1” … “policy_11” with patterns “test_1- *” … “test_11- *”
the text does not matter, for example:
{
“policy”: {
“policy_id”: “policy_1”,
“description”: “delete index after 7 days”,
“default_state”: “norm”,
“states”: [
{
“name”: “norm”,
“actions”: ,
“transitions”: [
{
“state_name”: “del”,
“conditions”: {
“min_index_age”: “7d”
}
}
]
},
{
“name”: “del”,
“actions”: [
{
“delete”: {}
}
],
“transitions”:
}
],
“ism_template”: {
“index_patterns”: [
“test_1- *”
],
“priority”: 50
}
}
}

after creating each policy, an index is created
PUT test_X-1
For the first 10, the policy is assigned, for the 11, no. Moreover, if, after creating 11 policies, you change any of the previously created ones, they also cease to be assigned

{
test_1-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_1”,
“index”: “test_1-1”,
“index_uuid”: “X1n6d3_YSLymYWHJVqTBow”,
“policy_id”: “policy_1”
},
test_2-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_2”,
“index”: “test_2-1”,
“index_uuid”: “h9olWdoPTYygyqjdlYVx2Q”,
“policy_id”: “policy_2”
},
test_3-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_3”,
“index”: “test_3-1”,
“index_uuid”: “t_Ejz_kMSna69o1iC3-o0A”,
“policy_id”: “policy_3”
},
test_4-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_4”,
“index”: “test_4-1”,
“index_uuid”: “q2WmtiqjQLysYZ-2miSXdQ”,
“policy_id”: “policy_4”
},
test_5-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_5”,
“index”: “test_5-1”,
“index_uuid”: “VziB0jHET5OqeapZbuOBew”,
“policy_id”: “policy_5”
},
test_6-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_6”,
“index”: “test_6-1”,
“index_uuid”: “Wn_cwBCpR9KwtiIhK7kyuw”,
“policy_id”: “policy_6”
},
test_7-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_7”,
“index”: “test_7-1”,
“index_uuid”: “jgyoskK6Ts6Da0lBZcUz8Q”,
“policy_id”: “policy_7”
},
test_8-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_8”,
“index”: “test_8-1”,
“index_uuid”: “0jSSQiImRfejIJgNsFR88A”,
“policy_id”: “policy_8”
},
test_9-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_9”,
“index”: “test_9-1”,
“index_uuid”: “tceow8_ZQVCylAO-LNUrHA”,
“policy_id”: “policy_9”
},
test_10-1”: {
“index.opendistro.index_state_management.policy_id”: “policy_10”,
“index”: “test_10-1”,
“index_uuid”: “Nkwvq42USK60HLi0BbbP4A”,
“policy_id”: “policy_10”
},
"test_11-1": {
“index.opendistro.index_state_management.policy_id”: null
}

}

@mouse I was more checking it wasn’t some kind of sharding issue.

I have just created 11 polices (additional to my existing 3 policies) and then created 11 indices with each using a different policy and all 11 indices had a policy set against them. I was testing on a dual node cluster using 1.13.1.

@mouse There is no limit on the templates that I am aware of, will also pull in the eng who worked on it to check.

@bowenlan Can you review the thread and see if you have any idea on what could cause the issue? And maybe try to reproduce with the patterns @mouse listed and see?

@mouse one thing I can think of is:
before checking if the newly created index matches with any ISM template, we will first do a check between ISM template lastUpdatedTime and index creationDate code, if the template got updated after the index created, it will not be used to do match check on this index anymore.

you can check template lastUpdatedTime by doing a GET policy call.

@bowenlan
I created policies yesterday

GET _opendistro/_ism/policies/policy_11
“last_updated_time” : 1617187160398, …

all response

{
“_id” : “policy_11”,
“_version” : 1,
“_seq_no” : 130,
“_primary_term” : 5,
“policy” : {
“policy_id” : “policy_11”,
“description” : “delete index after 7 days”,
“last_updated_time” : 1617187160398,
“schema_version” : 1,
“error_notification” : null,
“default_state” : “norm”,
“states” : [
{
“name” : “norm”,
“actions” : ,
“transitions” : [
{
“state_name” : “del”,
“conditions” : {
“min_index_age” : “7d”
}
}
]
},
{
“name” : “del”,
“actions” : [
{
“delete” : { }
}
],
“transitions” :
}
],
“ism_template” : {
“index_patterns” : [
“test_11-*”
],
“priority” : 50,
“last_updated_time” : 1616992631502
}
}
}

index just created
PUT test_11-2

GET _opendistro/_ism/explain/test_11-2
{
“test_11-2” : {
“index.opendistro.index_state_management.policy_id” : null
}
}

GET test_11-2/_settings
{
“test_11-2” : {
“settings” : {
“index” : {
"creation_date" : “1617233667816”,
“number_of_shards” : “1”,
“number_of_replicas” : “1”,
“uuid” : “0IztZX-USemwrB3KkIM9xA”,
“version” : {
“created” : “7100299”
},
“provided_name” : “test_11-2”
}
}
}
}

creation date of index more than “last_updated_time” of policy
1617233667816 vs 1617187160398

It should also be noted that the same policy (policy_11) starts working without any changes if I delete a block “ism_template”: {“index_patterns […]}” in any of the 10 previously created ones.

@bowenlan @dbbaughe
Noticed one more feature, on some data nodes I constantly see a lot of such messages in the log

log

[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_telegraf-arm-2021.01.24
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_telegraf-arm-2021.01.24
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_telegraf-arm-2021.01.23
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_telegraf-arm-2021.01.23
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_vault_audit_log-2021.01.29
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_vault_audit_log-2021.01.29
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_techresources-utaginfra-eventlog-2021.01.24
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_techresources-utaginfra-eventlog-2021.01.23
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_techresources-utaginfra-eventlog-2021.01.23
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_telegraf-arm-2021.01.29
[2021-04-02T04:10:25,060][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_telegraf-arm-2021.01.29
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_telegraf-arm-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_techresources-utaginfra-eventlog-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_telegraf-linux-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_telegraf-arm-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_telegraf-linux-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_techresources-utaginfra-eventlog-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_syslog-cicd-lgs-2021.01.28
[2021-04-02T04:10:25,335][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Finished executing attempt_transition for elk_syslog-cicd-lgs-2021.01.28
[2021-04-02T04:10:25,623][INFO ][c.a.o.i.i.ManagedIndexRunner] [vlg-elk-data01] Executing attempt_transition for elk_syslog-arm-2021.01.31

The interval of their appearance coincides with the option
“opendistro.index_state_management.job_interval”
but there are nodes that contain the primary shards of indexies whith ISM policy but do not contain such messages in the log.
The logging level on all nodes of the cluster is the same.

Can you share information in which case these messages are written to the log and should they be on all data nodes?

And, sorry, I didn’t write in the first post I am using ODFE 1.13.1 but this is not a clean install, it was constantly updated from one of the first versions (0.8.0 or 0.9.0). Maybe some artifacts have accumulated?

Hi @mouse,

Those logs show up on the nodes executing the internal jobs that manage the individual indices you have applied a policy to. The way it is distributed across nodes is to take the set of nodes that contain a shard of the .opendistro-ism-config index and split the job executions across these nodes. So it’s perfectly fine to not see these logs on some data nodes.

" It should also be noted that the same policy (policy_11) starts working without any changes if I delete a block “ism_template”: {“index_patterns […]}” in any of the 10 previously created ones."

This looks like the default search size of 10 documents. @bowenlan Is the internal search for finding the matching ISM templates not correctly getting all policies but instead only the top 10?

I think that is the issue, the internal search for ISM templates (policies) is not specifying the size so it’s defaulting to only getting 10 of them.

@mouse Thanks for finding this issue! Would you be able to create a bug report for us on GitHub - opendistro-for-elasticsearch/index-management: 🗃 Open Distro for Elasticsearch Index Management

1 Like

@dbbaughe
Done

2 Likes