Searching for Data Using a Vector Index

Vector indexes support data searches in different ways.

Standard Query

Standard vector query syntax is provided for vector fields with vector indexes. The following command will return n (specified by size/topk) data records that are most close to the query vector.

POST my_index/_search
{
  "size":2,
  "_source": false, 
  "query": {
    "vector": {
      "my_vector": {
        "vector": [1, 1],
        "topk":2
      }
    }
  }
}

Table 1 Parameters for standard query
Parameter	Description
vector (the first one)	Indicates that the query type is VectorQuery.
my_vector	Indicates the name of the vector field you want to query.
vector (the second one)	Indicates the vector value you want to query, which can be an array or a Base64 string
topk	Same as the value of size generally.
Other optional parameters	Indicates optional query parameters. You can adjust the vector index parameters to achieve higher query performance or precision. For more information, see Table 2.

Table 2 Optional query parameters
Type	Parameter	Description
Graph index configuration parameters	ef	Queue size of the neighboring node during the query. A larger value indicates a higher query precision and slower query speed. The default value is 200. Value range: (0, 100000]
Graph index configuration parameters	max_scan_num	Maximum number of scanned nodes. A larger value indicates a higher query precision and slower query speed. The default value is 10000. Value range: (0, 1000000]
IVF index configuration parameters	nprobe	Number of center points. A larger value indicates a higher query precision and slower query speed. The default value is 100. Value range: (0, 100000]

Compound Query

Vector search can be used together with other Elasticsearch subqueries, such as Boolean query and post-filtering, for compound query.

In the following two examples, top 10 (topk) results closest to the query vector are queried first. filter retains only the results whose my_label field is red.

Example of a Boolean query

POST my_index/_search
{
  "size": 10,
  "query": {
    "bool": {
      "must": {
        "vector": {
          "my_vector": {
            "vector": [1, 2],
            "topk": 10
          }
        }
      },
      "filter": {
        "term": { "my_label": "red" }
      }
    }
  }
}

Example of post-filtering

GET my_index/_search
{
  "size": 10,
  "query": {
    "vector": {
      "my_vector": {
        "vector": [1, 2],
        "topk": 10
      }
    }
  },
  "post_filter": {
    "term": { "my_label": "red" }
  }
}

ScriptScore Query

You can use script_score to perform Nearest Neighbor Search (NSS) on vectors. The query syntax is provided below.

The pre-filtering condition can be any query. script_score traverses only the pre-filtered results, calculates the vector similarity, and sorts and returns the results. The performance of this query depends on the size of the intermediate result set after the pre-filtering. If the pre-filtering condition is set to match_all, brute-force search is performed on all data.

POST my_index/_search 
 { 
   "size":2, 
   "query": { 
   "script_score": { 
       "query": { 
         "match_all": {} 
       }, 
       "script": { 
         "source": "vector_score", 
         "lang": "vector", 
         "params": { 
           "field": "my_vector", 
           "vector": [1.0, 2.0], 
           "metric": "euclidean" 
         } 
       } 
     } 
   } 
 }

Table 3 script_score parameters
Parameter	Description
source	Script description. Its value is vector_score if the vector similarity is used for scoring.
lang	Script syntax description. Its value is vector.
field	Vector field name
vector	Vector data to be queried
metric	Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean

Re-Score Query

If the GRAPH_PQ or IVF_GRAPH_PQ index is used, the query results are sorted based on the asymmetric distance calculated by PQ. CSS supports re-scoring and ranking of query results to improve the recall rate.

Assuming that my_index is a PQ index, an example of re-scoring the query results is as follows:

GET my_index/_search 
 { 
   "size": 10, 
   "query": { 
     "vector": { 
       "my_vector": { 
         "vector": [1.0, 2.0], 
         "topk": 100 
       } 
     } 
   }, 
   "rescore": { 
     "window_size": 100, 
     "vector_rescore": { 
       "field": "my_vector", 
       "vector": [1.0, 2.0], 
       "metric": "euclidean" 
     } 
   } 
 }

Table 4 Rescore parameter description
Parameter	Description
window_size	Vector search returns topk search results and ranks the first window_size results.
field	Vector field name
vector	Vector data to be queried
metric	Measurement method, which can be euclidean, inner_product, cosine, and hamming. Default value: euclidean

Painless Syntax Extension

CSS extension supports multiple vector distance calculation functions, which can be directly used in customized painless scripts to build flexible re-score formulas.

The following is an example:

POST my_index/_search
{
  "size": 10,
  "query": {
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "1 / (1 + euclidean(params.vector, doc[params.field]))",
        "params": {
          "field": "my_vector",
          "vector": [1, 2]
        }
      }
    }
  }
}

Table 5 Supported distance vector calculation functions
Function Signature	Description
euclidean(Float[], DocValues)	Euclidean distance function
cosine(Float[], DocValues)	Cosine similarity function
innerproduct(Float[], DocValues)	Inner product function
hamming(String, DocValues)	Hamming distance function Only vectors whose dim_type is binary are supported. The input query vector must be a Base64-encoded character string.

Parent topic: Configuring Vector Search for Elasticsearch Clusters

Предыдущая статья

Creating a Vector Index

Следующая статья

Optimizing the Write and Query Performance of Vector Search

Была ли эта статья полезна?

Поддержка Юридические документы