Anomaly detection with term aggregations

amir · April 24, 2020, 6:45am

I’ve been doing some research on the Anomaly Detection plugin and comparing the results with a separate analysis performed in Python. The use-case included searching for security anomalies, such as the number of 401 and 403 statuses per IP/user. For aggregating feature data I used single-value aggregations such as value_count and cardinality. The results are quite satisfactory.

I’ve got a couple more use cases I would like to try out but I’m not sure if Anomaly Detection supports such functionality. For example, I would like to perform term aggregations on features and search for anomalies within the count of terms. Specifically, I would like to provide the following input to the model:

[ { "country": "US", "doc_count": 1000 }, { "country": "CA", "doc_count": 700, }, { "country": "FR", "doc_count": 2, } ]

The expected behavior here would be showing an anomaly for France which is not a country visitors usually come from.

Is this functionality currently available with the plugin? If not, is it planned for implementation?

ylwu · June 4, 2020, 12:23am

This is anomaly detection based on cardinality which is already on our plan. Paste duplicate Github question here Term aggregation in custom query for features · Issue #88 · opendistro-for-elasticsearch/anomaly-detection · GitHub

amir · June 4, 2020, 12:02pm

That’s great to hear. Thank you for the response Yaliang

Topic		Replies	Views
Count terms in features of anomaly detectors Machine Learning	7	1260	December 13, 2022
Derivative Features for Anomaly Detection Plugins Machine Learning	2	257	August 11, 2023
Muti-variate Anomaly Detection Machine Learning anomaly-detection	4	412	January 30, 2023
Include counting feature in anomaly detector Machine Learning	1	431	February 7, 2023
Anomaly Detection DSL errors Machine Learning	1	846	December 21, 2021

Anomaly detection with term aggregations

Related Topics