How to get an insight of the heap usage

sezuan2 · November 4, 2021, 1:43pm

I’m wondering how to get a detailed/better overview if what is consuming how much memory in the heap?

The issue is that the heap usage is steadily growing:

This eventually leads to an OOM exception, even though there’s plenty of field cache data that could be dropped.
The cluster consists of 3 nodes with one index and ~1000 primary shards + 1 replica. There heap is configured to use 90GB of RAM. This means there are about 7.4 shards per 1GB heap.

Matthias

searchymcsearchface · November 4, 2021, 1:54pm

Have you looked at PerfTop? GitHub - opendistro-for-elasticsearch/perftop: 📈 PerfTop: A client for the Open Distro Performance Analyzer

NodeAnalysis has JVM metrics that might be useful.

sezuan2 · November 4, 2021, 2:07pm

No, not yet. I wanted to avoid to install additional plugins.

searchymcsearchface · November 4, 2021, 2:20pm

PerfTop is outside the cluster and if you use Open Distro, it has everything else it needs to support PerfTop.

sezuan2 · November 4, 2021, 2:22pm

Actually it’s ES 7.10.2 + opensecurity.

searchymcsearchface · November 4, 2021, 2:46pm

What’s opensecurity?

sezuan2 · November 4, 2021, 2:48pm

I mean, I use only this plugin:

searchymcsearchface · November 4, 2021, 2:55pm

Gotcha. Just Open Distro security. I would definitely look at adding performance analyzer and using PerfTop, it was pretty much built to help solve this type of issue.

sezuan2 · November 4, 2021, 3:35pm

I’m trying to install it. The plugin is listening on port 9600 but it’s not returning any data. Must it be installed on all nodes?

searchymcsearchface · November 4, 2021, 4:28pm

@sezuan2 I believe it does, yup. @sruti1312 If you have any extra info here, it might be helpful.

sezuan2 · November 4, 2021, 4:35pm

Gonna be tricky, since I’ve a dedicated master node on some hosts. The problem seems that data collecting can’t be enabled:

...--data '{"enabled": "true"}'  "https://localhost:9200/_opendistro/_performanceanalyzer/config?pretty"
{
  "performanceAnalyzerEnabled" : false,
  "rcaEnabled" : false,
  "loggingEnabled" : false,
  "shardsPerCollection" : 0,
  "batchMetricsEnabled" : false,
  "batchMetricsRetentionPeriodMinutes" : 7
}

sezuan2 · November 5, 2021, 7:29am

While I’m still struggling to get the Performance Analyzer activated here’s an observation:

The heap size correlates with the number of segments (or documents, but I guess it’s segments). The fullest cluster has more than 80000 segments. Not sure if this is a lot or not.

When I close and open the index, a lot of memory is freed. More than query cache and the ‘field data memory’

sezuan2 · November 5, 2021, 9:04am

Performance Analyzer is working now, but I fear it doesn’t give any insight what is consuming the heap.

For instance, when I query the memory usage of the index:

index         dc   fm  sm   svmm    sfbm    qcm  siwm
data  2198684956 73gb 1gb 19.9mb 585.3mb 18.8mb 2.2gb

Since this is a 3 node cluster this would be about 25.5GB per Node. The question is, why are 120GB not sufficient?

searchymcsearchface · November 5, 2021, 1:21pm

Are you running perftop NodeAnalysis? I think that has heap usage insight.

Re-reading this thread, I’m wondering if you have too much heap. I’m not an expert in this area tbh but I dug up this article about having too much heap (greater than 32gb)

sezuan2 · November 6, 2021, 5:13am

I guess I’ve no choice to use more than 32gb if 32gb are not sufficient. I understand that exceeding the border around 32gb will disable the pointer compression, however, using way more memory should compensate that. I’ll would also happily trade long running garbage collection for dying nodes because of OOM

sezuan2 · November 15, 2021, 7:28am

Maybe this might help others:

It turned out that limiting the field data cache mitigates the problem. According to the manual this cache grows unbounded until the circuit breaker saves day and the cache must be deleted manually.

While this might work for certain indices this doesn’t seem to work for my index. But the initial question still remains, it’s totally unclear to me why elasticsearch requires more than 100gb of heap to operate. Even when I add all caches and all the memory that the segments require in memory, I do not get a value that even comes close to the required memory needed.

Is it possible to see the memory requirements of read and write requests? Since bulk requests cannot be used for $reasons, maybe this is increasing the heap usage?

Topic		Replies	Views
Lightweight Debugging with Performance Analyzer and PerfTop in Open Distro for Elasticsearch \| Open Distro General Feedback	1	682	February 7, 2023
Real Time Root Cause Analysis in Open Distro for Elasticsearch \| Open Distro General Feedback	1	553	January 21, 2023
Heap optimization in ES 8.3, also in OpenSearch? OpenSearch	1	201	February 2, 2023
Architecture questions Performance Analyzer	1	1328	May 16, 2019
Question: Machine Learning - Node Heap Usage Machine Learning	2	606	March 5, 2021

How to get an insight of the heap usage

Related Topics