Reranking results with multiple vectors

navmarri · June 22, 2020, 5:53pm

I’ve created an index with two vectors with the following schema

ID, feature_vector1, feature_vector2

Here feature_vector1 is list of 500 floats and feature_vector2 is list of 100 floats
I want to have the following.

perform search on the entire index using feature_vector1
using the results of the search, I want to perform reranking of the results based on the similarity with feature_vector2 for the corresponding ID

I was able to perform KNN on feature_vector1 .
But not sure how to apply post_filter on top the result obtained on feature_vector1 . Is there a way to achieve this.

vamshin · June 22, 2020, 6:13pm

Hi @navmarri,

Could you help me understand what you mean by reranking of results with feature_vector2?

Post filter would basically trim away(filter out) the documents obtained from original query. it does not rescore?

navmarri · June 22, 2020, 6:16pm

@vamshin

First we obtain the knn results from feature_vector1. Just for the IDs that are obtained. I want to apply knn using feaure_vector2. We can think this as chaining query performing knn on top of the results of first knn results.
Does it make sense?

vamshin · June 22, 2020, 6:31pm

@navmarri I see what you mean. This is more like prefilter support for k-NN which is currently not available. We are working on this feature. Support custom scoring function for vectors. Using k-NN scores in a script_score query · Issue #50 · opendistro-for-elasticsearch/k-NN · GitHub.

As a work around probably, you could do boolean and operation(intersection between results from feature_vector1 and feature_vector2). Not a complete solution but should work. You might need to provide large k for getting results from intersection.

vamshin · June 22, 2020, 6:44pm

Example query to work around. You can also choose the weightage for the query to reflect scoring among the matched documents

curl -X POST "localhost:9200/myindex/_search" -H 'Content-Type: application/json' -d'
{
  "size" : 2,
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "knn": {
                  "my_vector": {
                  "vector": [7, 8],
                  "k": 2
                  }       
              }
          },
            "weight": 0.5
          }
        },
        {
          "function_score": {
            "query": {
              "knn": {
                  "my_vector": {
                  "vector": [3, 4],
                  "k": 2
                  }       
              }
            },
            "weight": 0.5
          }
        }
      ]
    }
  }
}
'

navmarri · June 22, 2020, 7:14pm

@vamshin Thanks for the suggestion.
What is the default score_mode
Is it summation of the weight from the two functions and pick the max?

vamshin · June 22, 2020, 8:25pm

Yes. Its sum of the scores and picks max. This gives you ability to give more weightage to the docs from 1st knn query or 2nd knn query. In the example i mentioned, we are giving equal weightage. Note, choose a very large k(you might want to experiment), to get better results.

Topic		Replies	Views
Reindexing Produces Different Result On The Same Query Vector k-NN	9	1082	May 12, 2021
Combining KNN score with keyword query k-NN	8	3092	March 11, 2021
Is it possible to use kNN-Search during aggregation? k-NN discuss	2	558	June 20, 2022
Opendistro KNN score giving different scores on the same query vector k-NN	3	1124	October 13, 2020
Elasticsearch Hybrid Query - No Results k-NN	8	3135	March 2, 2021

Reranking results with multiple vectors

Related Topics