How to get an insight of the heap usage

I’m wondering how to get a detailed/better overview if what is consuming how much memory in the heap?

The issue is that the heap usage is steadily growing:

This eventually leads to an OOM exception, even though there’s plenty of field cache data that could be dropped.
The cluster consists of 3 nodes with one index and ~1000 primary shards + 1 replica. There heap is configured to use 90GB of RAM. This means there are about 7.4 shards per 1GB heap.

Matthias

Have you looked at PerfTop? GitHub - opendistro-for-elasticsearch/perftop: 📈 PerfTop: A client for the Open Distro Performance Analyzer

NodeAnalysis has JVM metrics that might be useful.

No, not yet. I wanted to avoid to install additional plugins.

PerfTop is outside the cluster and if you use Open Distro, it has everything else it needs to support PerfTop.

Actually it’s ES 7.10.2 + opensecurity.

What’s opensecurity?

I mean, I use only this plugin:

Gotcha. Just Open Distro security. I would definitely look at adding performance analyzer and using PerfTop, it was pretty much built to help solve this type of issue.

I’m trying to install it. The plugin is listening on port 9600 but it’s not returning any data. Must it be installed on all nodes?

@sezuan2 I believe it does, yup. @sruti1312 If you have any extra info here, it might be helpful.

Gonna be tricky, since I’ve a dedicated master node on some hosts. The problem seems that data collecting can’t be enabled:

...--data '{"enabled": "true"}'  "https://localhost:9200/_opendistro/_performanceanalyzer/config?pretty"
{
  "performanceAnalyzerEnabled" : false,
  "rcaEnabled" : false,
  "loggingEnabled" : false,
  "shardsPerCollection" : 0,
  "batchMetricsEnabled" : false,
  "batchMetricsRetentionPeriodMinutes" : 7
}

While I’m still struggling to get the Performance Analyzer activated here’s an observation:

The heap size correlates with the number of segments (or documents, but I guess it’s segments). The fullest cluster has more than 80000 segments. Not sure if this is a lot or not.

When I close and open the index, a lot of memory is freed. More than query cache and the ‘field data memory’

Performance Analyzer is working now, but I fear it doesn’t give any insight what is consuming the heap.

For instance, when I query the memory usage of the index:

index         dc   fm  sm   svmm    sfbm    qcm  siwm
data  2198684956 73gb 1gb 19.9mb 585.3mb 18.8mb 2.2gb

Since this is a 3 node cluster this would be about 25.5GB per Node. The question is, why are 120GB not sufficient?

Are you running perftop NodeAnalysis? I think that has heap usage insight.

Re-reading this thread, I’m wondering if you have too much heap. I’m not an expert in this area tbh but I dug up this article about having too much heap (greater than 32gb)

I guess I’ve no choice to use more than 32gb if 32gb are not sufficient. I understand that exceeding the border around 32gb will disable the pointer compression, however, using way more memory should compensate that. I’ll would also happily trade long running garbage collection for dying nodes because of OOM :slight_smile:

Maybe this might help others:

It turned out that limiting the field data cache mitigates the problem. According to the manual this cache grows unbounded until the circuit breaker saves day and the cache must be deleted manually.

While this might work for certain indices this doesn’t seem to work for my index. But the initial question still remains, it’s totally unclear to me why elasticsearch requires more than 100gb of heap to operate. Even when I add all caches and all the memory that the segments require in memory, I do not get a value that even comes close to the required memory needed.

Is it possible to see the memory requirements of read and write requests? Since bulk requests cannot be used for $reasons, maybe this is increasing the heap usage?

1 Like