Understanding and controlling index auto merge behavior

Hi,
In my index, I need to perform updates to specific field.
I perform the update according to document id, and I only update a single field in the document.
Example:

POST fileinstances/_update/updatedfileinstance
{
  "doc" : 
  {
    "lastUpdated" : "2021-05-31 11:20:00"
  }
}

I try to minimize the updates because every update (as I understand) always creates a new document version and mark the previous version as deleted.
This means that in time, my index size will grow up.

As I read here, index auto merge can reduce index size and optimize it:

Merging reduces the number of segments in each shard by merging some of them together, and also frees up the space used by deleted documents. Merging normally happens automatically, but sometimes it is useful to trigger a merge manually.

My questions:

  1. What is the schedule of the auto merge? where can I see it and how do I control it?
  2. I read somewhere that auto-merge is not performed for index larger than 5Gb. Is that correct? If so, can I increase the threshold to 50Gb?
  3. If I keep updating documents, does it prevent auto-merge from running in the background?
  4. I have an _ism policy performing an index ‘rollover’ after size of 50Gb limit is reached. How do I force merge on the old index that becomes read only? Is the following policy correct?
PUT _opendistro/_ism/policies/fileinstances_policy
{
  "policy": {
    "description": "fileinstances rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_size": "50gb"
            }
          },
          {
          "force_merge": {
            "max_num_segments": 1
           }
          }
        ],
        "transitions": []
      }
    ]
  }
}

Thank you,
Ori.

hi,
can anyone answer the questions above?

Hey @orid,

Won’t have the answer to all your questions as my depth on the merge policy is rather limited right now, but to start:

Yes, when you do updates to the doc it is deleting the original and creating a new one internally. This is automatically cleaned up by the internal merge policy as it will merge two segments together and remove the deleted documents.

The merging still happens for indices over 5GB, where it starts not happening is when the individual segments go over 5GB. Indices are made up of shards which in turn are made up of segments. You should be able to look at your segment sizes using the node stats API to see what you’re currently dealing with.

Yes, that policy will force merge after rolling over. Based on your description though, not that index did not become read-only because you’re still doing updates to it. So I wouldn’t necessarily force merge down to 1 segment because the moment you do any update it will tombstone one of the documents and create a new segment for the new document. And then it is very unlikely that large segment you force merged down to will ever be considered for the auto merge because it is so large compared to the newly created tiny segments.

Some background reading which could be helpful: