Opendistro KNN score giving different scores on the same query vector

I am using “cosinesimil” as the knn.space_type in opendistro elasticsearch version 7.8.0. I indexed 3 documents with attribute type knn_vector(the only other attribute supplied other than the knn_vector was a status term with value 1). The 3 vectors for these docs were (2,2), (2,1) and (2,3).

Surprisingly, when I search for the vector (1,1) [with a post filter of status = 1] using the query:

{
“size”: 1,
“query”: {
“knn”: {
“embedding”: { // attribute embedding is of type knn_vector
“vector”: [1, 1],
“k”: 1
}
}
},
“post_filter”: {
“term”: {“status”: 1}
}
}

I am getting different recall vectors each time I execute the query. Sometimes , I get the doc with vector (2,1) with _score 0.5, while other times I get the doc (2,2) with _score 1.0.

Questions:

  1. Why am I getting different recall vectors when executing the same query.
  2. Why is the _score for doc with vector (2,1) from elasticsearch coming to be 0.5. Only cosinesimil should be influencing the score and cosinesimil between (2,1) and (1,1) is around 0.95.
  3. When using the same ES query with knn parameters k = 3 and size = 3, and execute it multiple times…sometimes I get the doc (2,1) with _score 0.5 and some other times I get it with score 0.95(the actual cosinesimil)

Hi @utpal

0.5 would be the score given by L2. I am not sure what is causing this error.

Checking out the code and tag v.1.9.0.0, and running locally with ./gradlew run, I was unable to reproduce the issue with the following commands:

export HOST_NAME=localhost:9200

curl -X PUT "${HOST_NAME}/myindex" -H 'Content-Type: application/json' -d'
{
  "settings" : {
    "number_of_shards" :   1,
    "number_of_replicas" : 0,
    "index": {
        "knn": true,
        "knn.space_type": "cosinesimil"
    }
  },
  "mappings": {
      "properties": {
        "my_vector": {
          "type": "knn_vector",
          "dimension": 2
        }
      }
  }
}
'

{"acknowledged":true,"shards_acknowledged":true,"index":"myindex"}

curl -X POST "${HOST_NAME}/myindex/_doc" -H 'Content-Type: application/json' -d'
> {
> "my_vector" : [2, 1],
> "status":1
> }
> '
{"_index":"myindex","_type":"_doc","_id":"nEgo_3QBbqF0R1iJ2SSG","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

curl -X PUT "${HOST_NAME}/myindex/_doc/2" -H 'Content-Type: application/json' -d'
> {
> "my_vector" : [2, 2],
> "status":1
> }
> '
{"_index":"myindex","_type":"_doc","_id":"2","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}

curl -X PUT "${HOST_NAME}/myindex/_doc/4?refresh=true" -H 'Content-Type: application/json' -d'
> {
> "my_vector" : [2, 3],
> "status":1
> }
> '
{"_index":"myindex","_type":"_doc","_id":"4","_version":1,"result":"created","forced_refresh":true,"_shards":{"total":1,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}

curl -X POST "${HOST_NAME}/myindex/_search" -H 'Content-Type: application/json' -d'
> {
>   "size": 1,
>   "query": {
>       "knn": {
>         "my_vector": { // attribute embedding is of type knn_vector
>           "vector": [1, 1],
>           "k": 1
>         }
>     }
>   },
>   "post_filter": {
>     "term": {"status": 1}
>   }
> }
> '
{"took":101,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"myindex","_type":"_doc","_id":"2","_score":1.0,"_source":
{
"my_vector" : [2, 2],
"status":1
}
...
{"took":8,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"myindex","_type":"_doc","_id":"2","_score":1.0,"_source":
{
"my_vector" : [2, 2],
"status":1
}
...
{"took":8,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"myindex","_type":"_doc","_id":"2","_score":1.0,"_source":
{
"my_vector" : [2, 2],
"status":1
}
...
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"myindex","_type":"_doc","_id":"2","_score":1.0,"_source":
{
"my_vector" : [2, 2],
"status":1
}

Could you provide the following information?

  1. Index mapping
  2. Index settings
  3. ODFE artifact type (rpm, deb, docker, etc.)
  4. Number of nodes in the cluster

Additionally, does it fail with the same error for ODFE v1.10.1.0?

Jack

@jmazane Please find the information here:

Index mapping(sharing the part comprising embedding) :

{
  "ads_knn_id0" : {
    "mappings" : {
      "_routing" : {
        "required" : true
      },
      .......
        "dailybudget" : {
          "type" : "long"
        },
        "embedding" : {
          "type" : "knn_vector",
          "doc_values" : false,
          "dimension" : 2
        },
        "endtime" : {
          "type" : "long"
        },
       ........
      }
    }
  }
}

Setting:

{
  "ads_knn_id0" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "30s",
        "number_of_shards" : "1",
        "provided_name" : "ads_knn_id0",
        "knn.space_type" : "cosinesimil",
        "max_result_window" : "50000",
        "knn" : "true",
        "creation_date" : "1601612336217",
        "analysis" : {
          "analyzer" : {
            "shopee_analyzer" : {
              "filter" : [
                "lowercase"
              ],
              "tokenizer" : "whitespace"
            }
          }
        },
        "number_of_replicas" : "1",
        "queries" : {
          "cache" : {
            "enabled" : "true"
          }
        },
        "uuid" : "PVZTn2tpTlGO3oHv2BDedA",
        "version" : {
          "created" : "7080099"
        }
      }
    }
  }
}

ODFE artifact type:

{
  "name" : "node-10-130-239-193",
  "cluster_name" : "ads-semantic-recall",
  "cluster_uuid" : "dAXZDHWDRAewTAALD5CBQA",
  "version" : {
    "number" : "7.8.0",
    "build_flavor" : "oss",
    "build_type" : "deb",
    "build_hash" : "757314695644ea9a1dc2fecd26d1a43856725e65",
    "build_date" : "2020-06-14T19:35:50.234439Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Number of nodes: 5

Thanks @utpal,

This may be related to this bug we are working on here: https://github.com/opendistro-for-elasticsearch/k-NN/issues/239.

In your cluster are you only creating “cosinesimil” indices?

Additionally, as a bit of a hacky workaround, before you run your index and query load, could you create a dummy cosinesimil index with 5 shards, and then delete it and then create your index, ingest your documents, and run your queries and let this thread know if the issue is still present?

We are working on fixing https://github.com/opendistro-for-elasticsearch/k-NN/issues/239 and patching it now and will update this thread once its complete.