Conduct a Hybrid Search

Milvus Docs 需要你的帮助

本文档暂时没有中文版本,欢迎你成为社区贡献者,协助中文技术文档的翻译。
你可以通过页面右边的 编辑 按钮直接贡献你的翻译。更多详情,参考 贡献指南。如需帮助,你可以 提交 GitHub Issue

This topic describes how to conduct a hybrid search.

A hybrid search is essentially a vector search with attribute filtering. By specifying boolean expressions that filter the scalar fields or the primary key field, you can limit your search with certain conditions.

The following example shows how to perform a hybrid search on the basis of a regular vector search. Suppose you want to search for certain books based on their vectorized introductions, but you only want those within a specific range of word count. You can then specify the boolean expression to filter the word_count field in the search parameters. Milvus will search for similar vectors only among those entities that match the expression.

Preparations

The following example code demonstrates the steps prior to a search.

If you work with your own dataset in an existing Milvus instance, you can move forward to the next step.

  1. Connect to the Milvus server. See Manage Connection for more instruction.
from pymilvus import connections
connections.connect("default", host='localhost', port='19530')
const { MilvusClient } =require("@zilliz/milvus2-sdk-node");
const milvusClient = new MilvusClient("localhost:19530");
connect -h localhost -p 19530 -a default
  1. Create a collection. See Create a Collection for more instruction.
schema = CollectionSchema([
    		FieldSchema("book_id", DataType.INT64, is_primary=True),
			FieldSchema("word_count", DataType.INT64),
    		FieldSchema("book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
		])
collection = Collection("book", schema, using='default', shards_num=2)
const params = {
  collection_name: "book",
  fields: [
    {
      name: "book_intro",
      description: "",
      data_type: 101,  // DataType.FloatVector
      type_params: {
        dim: "2",
      },
    },
	{
      name: "book_id",
      data_type: 5,   //DataType.Int64
      is_primary_key: true,
      description: "",
    },
    {
      name: "word_count",
      data_type: 5,    //DataType.Int64
      description: "",
    },
  ],
};
await milvusClient.collectionManager.createCollection(params);
create collection -c book -f book_intro:FLOAT_VECTOR:2 -f book_id:INT64 book_id -f word_count:INT64 word_count -p book_id
  1. Insert data into the collection (Milvus CLI example uses a pre-built, remote CSV file containing similar data). See Insert Data for more instruction.
import random
data = [
    		[i for i in range(2000)],
			[i for i in range(10000, 12000)],
    		[[random.random() for _ in range(2)] for _ in range(2000)],
		]
collection.insert(data)
const data = Array.from({ length: 2000 }, (v,k) => ({
  "book_intro": Array.from({ length: 2 }, () => Math.random()),
  "book_id": k,
  "word_count": k+10000,
}));
await milvusClient.dataManager.insert({
  collection_name: "book",
  fields_data: entities,
});
import -c book 'https://raw.githubusercontent.com/milvus-io/milvus_cli/main/examples/user_guide/search.csv'
  1. Create an index for the vector field. See Build Index for more instruction.
index_params = {
        "metric_type":"L2",
        "index_type":"IVF_FLAT",
        "params":{"nlist":1024}
    }
collection.create_index("book_intro", index_params=index_params)
const index_params = {
  metric_type: "L2",
  index_type: "IVF_FLAT",
  params: JSON.stringify({ nlist: 1024 }),
};
await milvusClient.indexManager.createIndex({
  collection_name: "book",
  field_name: "book_intro",
  extra_params: index_params,
});
create index

Collection name (book): book

The name of the field to create an index for (book_intro): book_intro

Index type (FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, RNSG, HNSW, ANNOY): IVF_FLAT

Index metric type (L2, IP, HAMMING, TANIMOTO): L2

Index params nlist: 1024

Timeout []:

Load collection

All CRUD operations within Milvus are executed in memory. Load the collection to memory before conducting a vector search.

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.load()
await milvusClient.collectionManager.loadCollection({
  collection_name: "book",
});
load -c book
In current release, volume of the data to load must be under 70% of the total memory resources of all query nodes to reserve memory resources for execution engine.

By specifying the boolean expression, you can filter the scalar field of the entities during the vector search. The following example limits the scale of search to the vectors within a specified word_count value range.

search_param = {
    "data": [[0.1, 0.2]],
    "anns_field": "book_intro",
    "param": {"metric_type": "L2", "params": {"nprobe": 10}},
    "limit": 2,
    "expr": "word_count <= 11000",
}
res = collection.search(**search_param)
const results = await milvusClient.dataManager.search({
  collection_name: "book",
  expr: "word_count <= 11000",
  vectors: [[0.1, 0.2]],
  search_params: {
    anns_field: "book_intro",
    topk: "2",
    metric_type: "L2",
    params: JSON.stringify({ nprobe: 10 }),
  },
  vector_type: 101,    // DataType.FloatVector,
});
search

Collection name (book): book

The vectors of search data(the length of data is number of query (nq), the dim of every vector in data must be equal to vector field’s of collection. You can also import a csv file without headers): [[0.1, 0.2]]

The vector field used to search of collection (book_intro): book_intro

Metric type: L2

Search parameter nprobe's value: 10

The max number of returned record, also known as topk: 2

The boolean expression used to filter attribute []: word_count <= 11000

The names of partitions to search (split by "," if multiple) ['_default'] []: 

timeout []:

Guarantee Timestamp(It instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is provided, then Milvus will search all operations performed to date) [0]: 

Travel Timestamp(Specify a timestamp in a search to get results based on a data view) [0]:
Parameter Description
data Vectors to search with.
anns_field Name of the field to search on.
params Search parameter(s) specific to the index. See Index Selection for more information.
limit Number of the most similar results to return.
expr Boolean expression used to filter attribute. See Boolean Expression Rules for more information.
partition_names (optional) List of names of the partition to search in.
output_fields (optional) Name of the field to return. Vector field is not supported in current release.
timeout (optional) A duration of time in seconds to allow for RPC. Clients wait until server responds or error occurs when it is set to None.
round_decimal (optional) Number of decimal places of returned distance.
Parameter Description
collection_name Name of the collection to search in.
search_params Parameters (as an object) used for search.
vectors Vectors to search with.
vector_type Pre-check of binary or float vectors. 100 for binary vectors and 101 for float vectors.
partition_names (optional) List of names of the partition to search in.
expr (optional) Boolean expression used to filter attribute. See Boolean Expression Rules for more information.
output_fields (optional) Name of the field to return. Vector field not support in current release.
Option Full name Description
--help n/a Displays help for using the command.

Check the returned results:

assert len(res) == 1
hits = res[0]
assert len(hits) == 2
print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")
console.log(results.results)
# Milvus CLI automatically returns the primary key values of the most similar vectors and their distances.

What's next

该页面是否对你有帮助?
评价成功!