Combining KNN score with keyword query

Previously I’ve used the default Elasticsearch release that includes cosineSimilarity functionality as part of the x-pack. I was able to run a normal keyword query and then multiply the keyword _score by a similarity score using a painless script (_score * cosineSimilarity(v1, v2)).

I have not figured out how to do the same thing with the KNN implementation. I see that additional functions can be provided to re-weight the KNN score, but I don’t see how to multiply the score from a basic keyword query with a KNN similarity score.

Is this kind of usage intended? If not, is there a workaround?

Thanks

Hi @timforr, sorry for the delayed response. We are working on custom scoring support at the moment: https://github.com/opendistro-for-elasticsearch/k-NN/pull/196.

In the first version of this, we will not have the functionality of _score * cosineSimilarity(v1, v2))

Basically, a query will look like this:

GET /knn_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
                 # apply some kind of filter
          }           
        }
      },          
      "script": {
        "lang": "knn",
        "source": "knn_score",
        "params": {
          "field": "test_knn_vector",
          "vector": [x, y, z],
          "space": "cosinesimil"  
        }
      }
    }
  }
}

Out of curiosity, why do you want to multiply the cosine similarity with the bm25 score?

Ok, thanks. My general use case is wanting to do a keyword search but then re-weighting those results by a similarity score (specifically, by a function of the similarity score). It actually seems like I can use function score to sort of do what I want with the current implementation, but in a kind of roundabout way compared to the simplicity of using a Painless script.

Actually, there is no way to do what I want (afaict). I want to take the score of a normal keyword query, e.g. A = 145, and multiply it by the score of a similarity score from KNN (between 0 and 1), e.g. B = 0.33.

My idea as a workaround was to take log(A) and log(B) in two function scores and combine them with a bool must query (which would add them, equivalent to multiplying them since I would take the logarithm). However, this does not work because the logarithm of B will always be negative which raises an error.

@timforr,

Created feature request to expose the similarity score functions to address similar use cases. https://github.com/opendistro-for-elasticsearch/k-NN/issues/213. We would prioritize in our next releases. Will update on this thread once the feature is deployed.

1 Like

Thanks @vamshin that’s great news! Do you have a rough release schedule? Any estimate of when this functionality would be available on AWS would help a lot with my planning.

Hi @timforr,

We are targeting our next release. We would update the thread with more concrete information once we have timelines for next release.

@timforr Out of curiosity how are you planning to combine the two queries ? I had a similar idea and was thinking of using https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-dis-max-query.html