How many data nodes, shards for this setup? ~100GB/day


I want to log Windows Servers on System and Security events. (Currently working with WinLogBeat, daily index for System events and Security). And I want to log Cisco devices ( Currently working Cisco SysLog --> LogStash OSS --> Elasticsearch. And I want to log some TrendMicro products (SysLog).

Note. The short explanation above is only for test purposes, 1-3 devices. In the end for production it is going to be 1000+ devices.

It is roughly around 100GB a day. How many data nodes is optimal. I know that 3 master is recommended, but how many data nodes? And how many primary shards e.g. should I have?

Hoping for an answer.

This is an older question, and the OP may already have an answer, but I thought would share a few thoughts for those who might find it later.

The are a few things you need to know to size a cluster for logging use-cases…

  1. average indexed document size

This is not the same as the size of the raw log, and depends to what degree the log is parsed and how many values are extracted into individual fields). This can be determined by querying the ES REST API and doing a bit of math.

  1. peak ingest rate

There must be enough nodes to handle the peak ingest rates or back pressure can result in lost data.

  1. average ingest rate over 24 hours

This is used to calculate the total volume of index data per day.

  1. retention period for searchable data

If data must be searchable for longer periods, it may be necessary to use hot/warm architecture, or similar multi-tier strategy.

I will start with an assumption… logs are an average of 250 bytes, and the indexed size is 350 bytes.

100GB of logs would be 400 million logs per day, or 4630 logs/second. If we assume the peak is 50% greater than the average, the peak would be 6945/s. With appropriate hardware, in a multi-node cluster, the ingest rate per node is around 15000 logs/sec. So the peak ingest rate isn’t going to be an issue when sizing a cluster.

400 million logs per day at an average indexed size of 350 bytes per log, results in 140GB of data per day. Add to that one replica for redundancy, which gives us 280GB per day. The maximum recommended storage volume for a node to which data is actively written is 6-8TB. However, to avoid filling the disk completely, Elasticsearch will not allocate new shards on a volume that is over 85% of its capacity. So even using the 8TB size of a node, the real capacity available for Elasticsearch data is 6.8TB.

A three node cluster would thus provide 20.4TB of storage, or 73 days. Additional nodes would be required to extend the retention period beyond 73 days.

At these relatively low ingest rates, you shouldn’t require dedicated master nodes, as long as you don’t cut corners on the hardware. SSD storage is a MUST! 8 CPU cores and 64GB RAM is a minimum. 12 & 96GB would handle complex queries even better, with 16 & 128GB being even better. Beyond that there are diminishing returns and it is better to add more nodes rather than bigger nodes.

NOTE: there is a lot of potential exceptions here. But in most cases this should serve you well.

1 Like