Reranking results with multiple vectors

I’ve created an index with two vectors with the following schema

ID, feature_vector1, feature_vector2

Here feature_vector1 is list of 500 floats and feature_vector2 is list of 100 floats
I want to have the following.

  • perform search on the entire index using feature_vector1
  • using the results of the search, I want to perform reranking of the results based on the similarity with feature_vector2 for the corresponding ID

I was able to perform KNN on feature_vector1 .
But not sure how to apply post_filter on top the result obtained on feature_vector1 . Is there a way to achieve this.

1 Like

Hi @navmarri,

Could you help me understand what you mean by reranking of results with feature_vector2?

Post filter would basically trim away(filter out) the documents obtained from original query. it does not rescore?

@vamshin

First we obtain the knn results from feature_vector1. Just for the IDs that are obtained. I want to apply knn using feaure_vector2. We can think this as chaining query performing knn on top of the results of first knn results.
Does it make sense?

@navmarri I see what you mean. This is more like prefilter support for k-NN which is currently not available. We are working on this feature. Support custom scoring function for vectors. Using k-NN scores in a script_score query · Issue #50 · opendistro-for-elasticsearch/k-NN · GitHub.

As a work around probably, you could do boolean and operation(intersection between results from feature_vector1 and feature_vector2). Not a complete solution but should work. You might need to provide large k for getting results from intersection.

Example query to work around. You can also choose the weightage for the query to reflect scoring among the matched documents

curl -X POST "localhost:9200/myindex/_search" -H 'Content-Type: application/json' -d'
{
  "size" : 2,
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "knn": {
                  "my_vector": {
                  "vector": [7, 8],
                  "k": 2
                  }       
              }
          },
            "weight": 0.5
          }
        },
        {
          "function_score": {
            "query": {
              "knn": {
                  "my_vector": {
                  "vector": [3, 4],
                  "k": 2
                  }       
              }
            },
            "weight": 0.5
          }
        }
      ]
    }
  }
}
'

@vamshin Thanks for the suggestion.
What is the default score_mode
Is it summation of the weight from the two functions and pick the max?

Yes. Its sum of the scores and picks max. This gives you ability to give more weightage to the docs from 1st knn query or 2nd knn query. In the example i mentioned, we are giving equal weightage. Note, choose a very large k(you might want to experiment), to get better results.