Performance and Sizing Help and Insights

cwade · October 15, 2019, 6:46pm

Question:
We are looking for some guidance or some commentary on our OpenDistro cluster which is deployed onto our Kubernetes infrastructure via Helm. I have noted the current configuration for the cluster below.
While our cluster seems stable, if there are slight bumps in the night the cluster seems to become unstable quickly. We recently had an issue where two (2) of the client containers were destroyed and recreated. This caused the masters to disconnect and the cluster to become unstable and yellow. Several shard had to be reinitialized which took ~36 hours to complete. During the yellow period we noticed a drop in the ingest rate by ~20%.
On previous occasions we recognized that garbage collection on the clients was impacting the rate of ingest, and we were seeing drops in the rate by 10-15% per day until the client containers were destroyed and redeployed (individually and serially).
We feel as though we are at the edge of a cliff with the current configuration of the cluster. As slight wind and it causes the cluster to fall and become yellow or red.
We are hoping that others can provide some insight into the next steps toward further stabilizing the cluster and enabling solid scaling for ingest. We would like to stop shooting in the dark with where in the cluster to devote resources and attention. It does not seem that there are any examples online that meet this same level of ingest rate on Kubernetes.

Version:

OD: 1.1.0
ES: 7.1.1

Data Type:

Syslog

Index Configuration:

Single daily index
- Shards: 10
- Replica: 1
Ingest Rate
- Documents per Day: ~6,800,000,000
- Size per Day: 1.8TB
Retention
- Indices closed after: 7 days
- Indices deleted after: 30 days

Architecture:

Data:
- Number: 10
- CPU: 4
- Memory: 32G
- Heap: 16G
Master:
- Number: 5
- CPU: 4
- Memory: 16G
- Heap: 8G
Client:
- Number: 5
- CPU: 2
- Memory: 8G
- Heap: 4G

Storage:

Storage Backed:
- NFS mounts to data nodes
- Disks: 7.2K

Garbage Collection:

JAVA_OPTS: “-XX:-UseConcMarkSweepGC -XX:-UseCMSInitiatingOccupancyOnly -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=75”

virajph · August 21, 2020, 10:04pm

Apologies for the delay. Please download the latest version of Open Distro for Elasticsearch. In case you are still facing this issue, please raise a Github Issue.

Topic		Replies	Views
Kubernetes-logging-helm Open Source Elasticsearch and Kibana	2	510	June 8, 2020
Kubernetes blog issues General Feedback	1	514	January 3, 2020
Kubernetes Implementation Open Source Elasticsearch and Kibana	4	1688	October 23, 2019
Need help on Helm deployment with local volumes OpenSearch	0	270	November 8, 2022
Helm Chart / Kubernetes OpenSearch	4	890	August 21, 2021

Performance and Sizing Help and Insights

Related Topics