K-NN Resource Usage

Hey! I was wondering if there was guidance that could be provided with regards resource usage for elastic search clusters running this plugin. I’m interesting in exploring given N documents, M graph links, and vectors of length d:

  1. How much should we scale our cluster to accommodate efficient queries?
  2. What SLAs can we provide with regards to the number of documents we can support while still surfacing performant queries?
  3. Are the underlying query graphs just stored on data nodes? And is the graphMemoryUsage statistic the best metric for exploring memory consumption increases? When we run GET _cat/indices and look at tm for a given index the measurement reads 1064kb while the node graph_memory_usage for GET _opendistro/_knn/stats range from 0 - 3171kb depending on the node. Which value is best for measuring memory consumption?
  4. If a cluster has multi indices each with their own knn vectors, how is the knn.memory.circuit_breaker.limit value measured? Is it on a per graph bases?

Thanks in advance any guidance!

Hi @dnock
1. How much should we scale our cluster to accommodate efficient queries?

In order to achieve efficient queries, all of the graphs will need to fit into the available memory.

Available memory = (RAM - Elasticsearch Max Heap Size) * Circuit Breaker Limit (i.e. 0.5 for 50%)

To estimate the amount of memory your graphs will take up, we use the following formula:

Total graph memory = 1.1 * (8*M + 4*dimension) * number of vectors (including replicas)

For efficient queries:

Total graph memory < Available memory

2. What SLAs can we provide with regards to the number of documents we can support while still surfacing performant queries?
It depends on the index set up configuration like number of shards. Here are some numbers from our experiment:
Data set:- 150M vectors with 128 dimensions across different indices.
Algo params :- m=16, efSearch=1024, efConstruction=1024,
No of data nodes :- 6, m5.12xlarge
Mater nodes :- 3, m5.xlarge

Latencies:-
tp50: 22ms
tp90: 40ms
tp99: 90ms

We have done performance analysis for different vector dimensions and collection. We need to formalize and put in the consumable manner. We are prioritizing the effort to bring this to the performance tuning doc.

3. Are the underlying query graphs just stored on data nodes? And is the graphMemoryUsage statistic the best metric for exploring memory consumption increases? When we run GET _cat/indices and look at tm for a given index the measurement reads 1064kb while the node graph_memory_usage for GET _opendistro/_knn/stats range from 0 - 3171kb depending on the node. Which value is best for measuring memory consumption?
Yes, the underlying graphs are just stored on data nodes. No graphs will be stored on dedicated masters.

Yes, graphMemoryUsage statistic is the best metric for evaluating memory consumption increases. Also keep an eye on cache capacity reached metric and circuit breaker triggered. These indicate that the cache has been filled up and higher latencies on search will follow.

_cat/indices will not keep track of the memory the graphs use. GET _opendistro/_knn/stats should be preferred for measuring memory consumption.

4. If a cluster has multi indices each with their own knn vectors, how is the knn.memory.circuit_breaker.limit value measured? Is it on a per graph bases?
knn.memory.circuit_breaker.limit applies to the total memory of all of the k-NN indices, not per index. For example, if you have 10 indices, each with 5 GB of graphs and the circuit breaker limit is set to a value that allows 40 GB of available memory, only 8 of those indices would fit in the cache.

Jack

Thanks for the response! This is incredibly helpful! I have one more question with regards to the knn.memory.circuit_breaker.limit value. So if we’ve configured our heap size setting for our elastic search cluster, knn.memory.circuit_breaker.limit allows us to configure the remaining memory percentage we’d like to allow our HNSW graphs to consume [(total ram - Xmx) * knn.memory.circuit_breaker.limit] on our datanodes?

Correct, we set it by default to 50% of the remaining memory after accounting for the JVM because Lucene and Elasticsearch use off heap memory for other things such as file system caching and we want to leave space for this.