When no write, why Elasticsearch performs indexing every 'n' seconds?

Hello All,

I have basic question regarding elastic search.

As per documentation : By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
Reference: Refresh API | Elasticsearch Guide [7.14] | Elastic

Also as per documentation: When a document is stored, it is indexed and fully searchable in near real-time–within 1 second.
Reference : Data in: documents and indices | Elasticsearch Guide [7.14] | Elastic

So when write happens, indexing happen. When write is not happening and documents are already indexed, then why elastic search indexes every 1 second existing documents?

Hey Deepak,

AFAIK, the refresh does not only apply to newly indexed data, but also tries to prune the tombstones along the way (previously deleted documents). So even nothing is being indexed, the routine index maintenance may be necessary. Hope it makes sense.

Best Regards,
Andriy Redko

1 Like

Hi Andriy,

Thank you very much for answer.

Your answer completely makes sense.

Thanks and regards,
Deepak.

Hi Andriy,

For my edification, does Inserting or Updating a single document, cause the entire Index to be re-created OR is the Index simply updated very optimally ? I believe it would be the latter.

Also is the performance of adding new records and indexes being updated an O(1) or an O( n) operation ?

Thanks in advance for your answer.

Thanks and Regards,
Deepak

Hi Deepak,

For my edification, does Inserting or Updating a single document, cause the entire Index to be re-created OR is the Index simply updated very optimally ? I believe it would be the latter.

OpenSearch uses Apache Lucene under the hood. Index (ingest new document) operation is straightforward but update is not - it is equivalent to soft delete / insert new document. I am not sure what you mean by “entire Index to be re-created” though, could you elaborate a bit?

Also is the performance of adding new records and indexes being updated an O(1) or an O( n) operation ?

I don’t think O() notation would strictly apply here, it is a function of index settings (mappings / replication / shards / routing / …), bulk / non-bulk operation and also the document size. But intuitively, indexing new documents should be more lightweight than updating the existing ones.

Best Regards,
Andriy Redko

Apologies for using the word re-create; I will paraphrase. I was wanting to know if inserting a single document causes the entire index to be re-built.

My question around O(1) Vs O ( n) was to confirm : That the time required to Index a new document remains same regardless of 10 or a million pre-existing documents in the Index. Meaning, the time required to Index new Documents does not increase, when I have a million records in the Index ?

Thank you for clarifying. The index won’t be rebuilt and the amount of the documents in the index should not have noticeable impact on the new document insertion (in general, but there are exceptions, for example, when index has rollover policy, the new document insertion may lead to new index creation).

Thank you very much. That was great help.

Regards,
Deepak