混合搜索

除了向量以外,Milvus 还支持布尔值、整型、浮点等数据类型。在 Milvus 中,一个 collection 可以包含多个字段来代表数据特征或属性。Milvus 是一款灵活的向量数据库,还支持在向量相似度检索过程中进行标量字段过滤。

混合搜索是一种向量相似度检索。在混合搜索时,你可以通过使用布尔表达式(boolean expression)进行标量字段过滤。

  1. 连接至 Milvus 服务器:
from pymilvus import connections
connections.connect("default", host='localhost', port='19530')
import { MilvusClient } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("localhost:19530");
  1. 准备 collection 参数并创建 collection:
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType
>>> collection_name = "test_collection_search"
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection(collection_name, schema)
const COLLECTION_NAME = 'test_collection_search'
milvusClient.collectionManager.createCollection({
  collection_name: COLLECTION_NAME,
  fields: [
    {
      name: "films",
      description: "vector field",
      data_type: DataType.FloatVector,
      type_params: {
        dim:"2
      }
    },
    {
      name: "film_id",
      data_type: DataType.Int64,
      autoID: false,
      is_primary_key: true,
      description: "",
    },
  ],
});
  1. 随机生成向量数据并插入新建 collection 中:
>>> import random
>>> data = [
...     [i for i in range(10)],
...     [[random.random() for _ in range(2)] for _ in range(10)],
... ]
>>> collection.insert(data)
>>> collection.num_entities
10
let id = 1;
const entities = Array.from({ length: 10 }, () => ({
  films: Array.from({ length: 2 }, () => Math.random() * 10),
  film_id: id++,
}));

await milvusClient.collectionManager.insert({
  collection_name: COLLECTION_NAME,
  fields_data: entities,
});
  1. 将集合加载到内存中并进行向量相似度检索:
>>> collection.load()
>>> search_param = {
...     "data": [[1.0, 1.0]],
...     "anns_field": "films",
...     "param": {"metric_type": "L2"},
...     "limit": 2,
...     "expr": "film_id in [2,4,6,8]",
... }
>>> res = collection.search(**search_param)
await milvusClient.collectionManager.loadCollection({
  collection_name: COLLECTION_NAME,
});
await milvusClient.dataManager.search({
  collection_name: COLLECTION_NAME,
  // partition_names: [],
  expr: "film_id in [1,4,6,8]",
  vectors: [entities[0].films],
  search_params: {
    anns_field: "films",
    topk: "4",
    metric_type: "L2",
    params: JSON.stringify({ nprobe: 10 }),
  },
  vector_type: 100, // float vector -> 100
});
  1. 检查返回结果:
>>> assert len(res) == 1
>>> hits = res[0]
>>> assert len(hits) == 2
>>> print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
- Total hits: 2, hits ids: [2, 4]
>>> print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")
- Top1 hit id: 2, distance: 0.10143111646175385, score: 0.101431116461
// search result will be like:
{
  status: { error_code: 'Success', reason: '' },
  results: [
    { score: 0, id: '1' },
    { score: 9.266796112060547, id: '4' },
    { score: 28.263811111450195, id: '8' },
    { score: 41.055686950683594, id: '6' }
  ]
}
该页面是否对你有帮助?
评价成功!