Catch 22 : downgrade to reindex : fails to start ES

Hi,
We were running 7.9.1 but after after a machine power outage, ES did not start due to
an index created at version 5.6. And recommends downgrading to 6.x and reindex.

So I replaced with dpkg -i elasticsearch-oss-6.8.13.deb and
now I get an error message in startup :
failed to read [id:116, file:/dynga/es-iou2/elasticsearch/nodes/0/_state/node-116.st

    [2020-11-04T10:41:59,031][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [iou2.uninett.no] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[java.io.IOException: failed to read [id:116, file:/dynga/es-iou2/elasticsearch/nodes/0/_state/node-116.st]]; nested: IOException[failed to read [id:116, file:/dynga/es-iou2/elasticsearch/nodes/0/_state/node-116.st]]; nested: XContentParseException[[-1:36] [node_meta_data] unknown field [node_version], parser not found];

should I also downgrade some or all of the suppor packages to do the reindexing necessary ?

    opendistro-alerting opendistro-anomaly-detection opendistro-index-management opendistro-job-scheduler opendistro-knn
  opendistro-performance-analyzer opendistro-security opendistro-sql opendistroforelasticsearch

I turns out that after trying a couple of other version og elasticsearch-oss the 7.9.1 version eventually accepted the index and came up. Case dismissed !

After a reboot, the problem is back. No change in sw the last weeks. So how can ES come in such a state that it suddenly don’t acet indexes that has been up and running after a restart. I noticed there is a field version created on and one for the current. Could be that the startup code makes wrong guesses ?

We have both newer and older indices. How can we get our data back ?

Can you show the message where it recommends downgrading?

I am wondering if perhaps the message is incorrect and the problem really has nothing to do with downgrading. Perhaps you just have some indexes that got unrecoverably corrupted during the power failure

java.lang.IllegalStateException: The index [[uninett6/9HwC3NFKT7m0Ut7SK6M7Qw]] was created with version [5.6.10] but the minimum compatible version is [6.0.0-beta1]. It should be re-indexed in Elasticsearch 6.x before upgrading to 7.9.1.

Did you ever actually run the version 5.6.10 like it claims?

Yes that may have been the case we sstartet at about 4 a few years ago.

(I cooperate with okvittem on this sometime “grumpy” ES-cluster.)

The ES cluster in question is a single node cluster. It seems to have a folder in its data-path for both a node 0 and a node 1. Folders and files for all operational indices are located under node 0. Some other legacy/zombie indices exists under node 1. Occasionally, after a reboot and restart, ES seems to discover these legacy indices and starts complaining about too old index versions. Removing node 1 files made ES start. However ES then creates a new node 1 folder hierarchy (with no indices).

… so it may seem that the question is: Why does our system insist on having a node 1 data folder in addition to the operational node 0 ?

We now seem to have realized why our ES installation sometimes insists on adding and/or starting as node 1 even though it is a single node systems (and should start node 0).

It turned out we had node.max_local_storage_nodes=3 set in elasticsearch.yml. This led to that when ES found node 0 locked (e.g. after a “rough” restart of some sort), it assumed some other ES process was running and hence booted up node 1 instead.

Setting node.max_local_storage_nodes=1 forced ES to fail at startup if node 0 resources were locked.

If there is no relevant reason for node 0 being locked, removing all *.lock files in all sub-folders of the node/0 folder in ES’s datapath enables ES to start node 0 again.
(Ref Elasticsearch: Failed to obtain node locks - #12 by rahulnama - Elasticsearch - Discuss the Elastic Stack )

NOTE: This type of “hacking” inside ES’s data path should be done with care, or rather not at all (according to ES-developers).