Cluster monitoring

As the fork is moving on, I think there’s a gap which needs to be filled: an alternative to the X-Pack cluster monitoring API and, possibly, a Kibana app like the original Kibana Monitoring app.
Managing a cluster without a real-time overview of the cluster itself is not simple so this feature would be a killer one IMHO.
Btw: IMHO it is a SHAME elastic did not include such basic feature in the apache 2.0 codebase instead of x-pack (but the same can also be said about the security features covered now by ODFE and previously by Floragunn’s SearchGuard).
@searchymcsearchface what d’you think about this one ? Do you think would it be desiderable and feasible ?

For years now we’ve been running one of the Elasticsearch Prometheus exporters, Prometheus, and Grafana to monitor our clusters. I’ve played with the X-Pack monitoring and Metricbeat, but still prefer and rely on the Prometheus monitoring.

Of course that’s a lot more to set up than just flipping a switch in Kibana, but ES self-monitoring is a bad idea in production (you’re blind when Kibana is down because requests are timing out, etc), so you really need a separate cluster to send metrics to, which is even more to set up and run.

I’d love to see ES-fork natively publish metrics for Prometheus/OpenMetrics. I may even take this on once the fork is stable, the standalone exporters are a bit of a mess.

Hi Retzkek, interesting point!
I agree that having external monitoring instead of self-monitoring increase resilience but having es+exporters+prometheus+graphana just for monitoring ES could be “too much” for the average user IMHO.
If we’re talking about large clusters or multiple clusters then relying on self-monitoring could lead to issues but for the average ES installation I still think a self-contained solution would be a boon; indeed the main strength in xpack’s monitoring API+kibana app is that such solution works “by default” without having to put a lot of pieces togheter.

just my 2 cents :wink:

I’ll be honest, I’m not super familiar with that x-pack feature. Does this overlap functionally with Performance Analyzer and PerfTop? [Note: I’m not debating inclusion/exclusion in the core, just trying to understand the desire.]

1 Like

Yeah, Architecturally, monitoring a tool with itself leaves some holes, on the other hand at some point you have monitoring inception. But to @tvc_apisani’s point, easy often wins over perfect.

Hi,
i think it does not overlap with PA/PT… they serve different purposes: PA/PT is more focused on ‘perf tuning’, xpack in offering a vision of the cluster itself (nodes, indices, etc) without dealing too much with internal details (althought some are offered, as the ‘shards per each index’…).
Moreover, the xpack feature allows you to have a vision of your cluster ‘at-a-glance’ from a webapp, PA/PT works via cli access.

just to recap: I think PA/PT and monitoring are complementary… indeed the original ES stack was lacking a tool like PA/PT and ODFE fixed such lack perfectly.

1 Like

There’s certainly room for both the easy and the good, I wasn’t so much disagreeing as offering another viewpoint. PA/PT look really interesting, and seem to capture the sorts of stats we monitor, but for people like us who already have Prometheus and Grafana it’s much preferable to integrate the ES monitoring.

monitoring inception

It’s easy, Prometheus monitors ES, and in turn sends its stats to Graphite, which sends its stats to InfluxDB, which we scrape stats from into TimescaleDB, and then we use Metricbeat to monitor the Postgre databse with ES. What could be simpler? :smile:

1 Like

Hi,

I think the question is if the fork should include the monitoring app/UI, which is not just about charts and dashboards, but also metric and other data collection, plus alert rule triggering and alert notifications. I don’t think the stats API that exposes ES metrics needs to change. If that changed all monitoring solutions, whether SaaS solutions or OSS tools, would break and need adjustments to start collecting metrics from a different API that exposes metrics in a different format.

Otis

Sematext Cloud - Full Stack Observability - https://sematext.com/
Monitoring for infra, logs, frontend, APIs, websites, uptime

1 Like

Hi Otis,
AFAIK the monitoring feature of x-pack is based on some standard ES stats and some specific added by x-pack itself… but I might be wrong.

AFAIK there is no appetite to change any end-user exposed APIs for your exact reason. Breaking the world is no fun.

Just for clarification: I’ve never suggested any compatibility breaking change. In case the monitoring feature would be added it should mimic the protocotol/interface of the one in xpack in order to achieve and maintain interopability with Elastic’s ES and its ecosystem.

1 Like

I think at a minimum we should have something we can use to integrate with existing monitoring systems.

Since OpenMetrics format are based on prometheus format, and many monitoring systems including SaaS solutions like Datadog support it, I suggest to include the “Prometheus Exporter Plugin” with the Fork distribution. GitHub - vvanholl/elasticsearch-prometheus-exporter: Prometheus exporter plugin for Elasticsearch It’s Apache 2 licensed, so shouldn’t be any legal issue with having it as part of the Fork.

What do you think guys?

2 Likes

Re X-pack. I only see X-Pack APIs — Elasticsearch 7.11.0 documentation which doesn’t really show any special monitoring APIs exposed via X-Pack. I know Sematext makes use of the standard ES stats API for collecting all kinds of ES metrics. X-Pack exposes other APIs, for other aspects of cluster monitoring, but not for retrieving metrics, from what I can tell.

Otis

1 Like

So X-Pack, on the ES side, only offers a collector for the standard metrics ? If so, tihngs should be simpler than originally expected.

Hey @shamil ,

any specific reason you are suggesting to use GitHub - vvanholl/elasticsearch-prometheus-exporter: Prometheus exporter plugin for Elasticsearch as compared to GitHub - prometheus-community/elasticsearch_exporter: Elasticsearch stats exporter for Prometheus ? The later works quite well and easy integreated with Prometheus–> Grafana.

Just wanted to also get the opinions of others as well here on the best practices.

Best regards,

Hi @GezimSejdiu,
looking at GitHub it seems vvanholl’s exporter is up-to-date and aligned to ES releases while justwatchcom’s one seems a bit abandoned with a lot of pending PR and issues and a last release in Aug 2019. Is there anyone with hands-on experience with the former or the latter ?

1 Like

I’ve no experience with justwatchcom, but those are different in how they operate. The GitHub - vvanholl/elasticsearch-prometheus-exporter is actually a native plugin for ES, unlike the other one. Which means it runs as part of the ES process and not alongside. That’s why I suggested to have it included as part of the Fork distribution.

Besides that, I really like the plugin it has all required metrics, we use it extensively in our 14 productionion clusters.

1 Like

To sum up, there two way of monitoring an Elasticsearch cluster:

  1. Prometheus way: with Elasticsearch exporter, kibana exporter, logstash exporter, grafana and alertmanager

PROs:

  • no cluster ovehead
  • scalability
  • free grafana dashbaord for all the stack
  • in place if you have a decent dockerswarm architecture or K8S
  • already available
  • Prometheus is better than ES for managing metrics

CONTROs:

  • more infrastructure components
  • how much history do you need? (Prom usually 1/2 weeks or 1 month. You need to add Thanos)
  1. Implement a solution similar to XPack monitoring.
    The fast path should be to have a thread that every n seconds index the node state in an index via a pipeline. Create the dashboards for the metrics. Using OD alerting for the alerting.
    The first step it’s initially easy, plus you need to add external cluster writing, and other feature and it will require some time.

PROs:

  • embedded in Elasticsearch
  • ES way to do things
  • best way for small cluster
  • we can reusing OD alerting

CONTROS:

  • cover only elasticsearch (otherwise you need to implement the same things for other components kibana/logstash)
  • light overhead on the node

IMHO I understand the need for solution 2, but the most useful one is the 1 in my experience because Elasticsearch doesn’t live on its own in infrastrutture ecosystem and Prometheus/Grafana are very useful to monitor all the other components (docker, microservices, …)

2 Likes

I’d agree but with some notes:

  1. the prometheus solution is not homogeneous: ES and Kibana have exporters as java-plugins, logstash has an external exporter which requires a standalone instance (i.e. using docker). The embedded solution, assumed it mimics the XPack protocol, would look the same across all the stack.
  2. the prometheus solution does “break” compatibility with ELK: a user migrating from ELK+Xpack to ODFE would assume monitoring could work “by default”, without having to install external tools. Using the first solution ODFE would operare in a way and standard ELK in another one. This could be a problem or simply not at all, depending on ODFE project’s goals as a whole. It is also strictly related to the target audience of the project: small users could find the latter solution cumbersome, tech-savvy users -already confortable with grafana, prometheus and so on- would find the former preferable.
    Maybe, the most preferable solution would be having both but I know it could be overkill.

Thanks a lot for your prompt reply and also for sharing your good experience with GitHub - vvanholl/elasticsearch-prometheus-exporter: Prometheus exporter plugin for Elasticsearch. Indeed, being embedded with ES as a plugin has some benefits, and integrating that with the new ES-Fork will have the potential to provide such metrics out of the box without us setting up another service – outside of the stack. As of now, I find it more scalable (depending on the setup) using GitHub - prometheus-community/elasticsearch_exporter: Elasticsearch stats exporter for Prometheus as I do not need to touch the base ODFE setup by installing an additional plugin. If that becomes part of the core setup, I can easily switch to the new one.

Looking forward to this full integration observability of ODFE (in addition to GitHub - opendistro-for-elasticsearch/perftop: 📈 PerfTop: A client for the Open Distro Performance Analyzer of course) on a new ES-Fork.

Best,