🚀 免費嘗試 Zilliz Cloud,完全托管的 Milvus,體驗速度提升 10 倍!立即嘗試

milvus-logo
LFAI
主頁
  • 使用者指南
  • Home
  • Docs
  • 使用者指南

  • 搜尋與重新排名

  • 篩選搜尋

篩選搜尋

ANN 搜尋會找出與指定向量嵌入最相似的向量嵌入。然而,搜尋結果不一定總是正確的。您可以在搜尋請求中加入篩選條件,讓 Milvus 在進行 ANN 搜尋之前先進行元資料篩選,將搜尋範圍從整個集合縮小到只有符合指定篩選條件的實體。

概述

在 Milvus 中,篩選搜尋分為兩種類型 -標準篩選迭代篩選- 取決於應用篩選的階段。

標準篩選

如果一個集合包含向量嵌入及其元資料,您可以在 ANN 搜尋之前篩選元資料,以改善搜尋結果的相關性。一旦 Milvus 接收到帶有篩選條件的搜尋請求,它就會將搜尋範圍限制在符合指定篩選條件的實體內。

Filtered search 過濾搜尋

如上圖所示,搜尋請求包含chunk like % red % 作為篩選條件,表示 Milvus 應在chunk 欄位中有red 字元的所有實體中進行 ANN 搜尋。具體而言,Milvus 會執行下列工作。

  • 篩選符合搜尋請求中篩選條件的實體。

  • 在篩選過的實體中進行 ANN 搜尋。

  • 返回前 K 個實體。

迭代篩選

標準篩選程序可有效地將搜尋範圍縮小到一個很小的範圍。但是,過於複雜的篩選表達式可能會導致非常高的搜尋延遲。在這種情況下,迭代篩選可作為另一種選擇,有助於減少標量篩選的工作量。

Iterative filtering 迭代篩選

如上圖所示,使用迭代篩選的搜尋會以迭代方式執行向量搜尋。迭代器返回的每個實體都會經過標量篩選,此過程會持續進行,直到達到指定的 topK 結果為止。

此方法可大幅減少接受標量篩選的實體數量,因此特別有利於處理高度複雜的篩選表達式。

然而,值得注意的是,迭代器一次處理一個實體。這種循序處理的方式可能會導致較長的處理時間或潛在的效能問題,尤其是當大量的實體需要進行標量篩選時。

範例

本節示範如何進行篩選搜尋。本節中的程式碼片段假設您的集合中已有下列實體。每個實體都有四個欄位,分別是id向量顏色喜好

[
    {"id": 0, "vector": [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592], "color": "pink_8682", "likes": 165},
    {"id": 1, "vector": [0.19886812562848388, 0.06023560599112088, 0.6976963061752597, 0.2614474506242501, 0.838729485096104], "color": "red_7025", "likes": 25},
    {"id": 2, "vector": [0.43742130801983836, -0.5597502546264526, 0.6457887650909682, 0.7894058910881185, 0.20785793220625592], "color": "orange_6781", "likes": 764},
    {"id": 3, "vector": [0.3172005263489739, 0.9719044792798428, -0.36981146090600725, -0.4860894583077995, 0.95791889146345], "color": "pink_9298", "likes": 234},
    {"id": 4, "vector": [0.4452349528804562, -0.8757026943054742, 0.8220779437047674, 0.46406290649483184, 0.30337481143159106], "color": "red_4794", "likes": 122},
    {"id": 5, "vector": [0.985825131989184, -0.8144651566660419, 0.6299267002202009, 0.1206906911183383, -0.1446277761879955], "color": "yellow_4222", "likes": 12},
    {"id": 6, "vector": [0.8371977790571115, -0.015764369584852833, -0.31062937026679327, -0.562666951622192, -0.8984947637863987], "color": "red_9392", "likes": 58},
    {"id": 7, "vector": [-0.33445148015177995, -0.2567135004164067, 0.8987539745369246, 0.9402995886420709, 0.5378064918413052], "color": "grey_8510", "likes": 775},
    {"id": 8, "vector": [0.39524717779832685, 0.4000257286739164, -0.5890507376891594, -0.8650502298996872, -0.6140360785406336], "color": "white_9381", "likes": 876},
    {"id": 9, "vector": [0.5718280481994695, 0.24070317428066512, -0.3737913482606834, -0.06726932177492717, -0.6980531615588608], "color": "purple_4976", "likes": 765}
]

使用標準篩選搜尋

以下程式碼片段示範使用標準篩選的搜尋,以下程式碼片段中的請求帶有篩選條件和數個輸出欄位。

from pymilvus import MilvusClient

client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

query_vector = [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]

res = client.search(
    collection_name="my_collection",
    data=[query_vector],
    limit=5,
    # highlight-start
    filter='color like "red%" and likes > 50',
    output_fields=["color", "likes"]
    # highlight-end
)

for hits in res:
    print("TopK results:")
    for hit in hits:
        print(hit)

import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.service.vector.request.SearchReq
import io.milvus.v2.service.vector.request.data.FloatVec;
import io.milvus.v2.service.vector.response.SearchResp

MilvusClientV2 client = new MilvusClientV2(ConnectConfig.builder()
        .uri("http://localhost:19530")
        .token("root:Milvus")
        .build());

FloatVec queryVector = new FloatVec(new float[]{0.3580376395471989f, -0.6023495712049978f, 0.18414012509913835f, -0.26286205330961354f, 0.9029438446296592f});
SearchReq searchReq = SearchReq.builder()
        .collectionName("filtered_search_collection")
        .data(Collections.singletonList(queryVector))
        .topK(5)
        .filter("color like \"red%\" and likes > 50")
        .outputFields(Arrays.asList("color", "likes"))
        .build();

SearchResp searchResp = client.search(searchReq);

List<List<SearchResp.SearchResult>> searchResults = searchResp.getSearchResults();
for (List<SearchResp.SearchResult> results : searchResults) {
    System.out.println("TopK results:");
    for (SearchResp.SearchResult result : results) {
        System.out.println(result);
    }
}

// Output
// TopK results:
// SearchResp.SearchResult(entity={color=red_4794, likes=122}, score=0.5975797, id=4)
// SearchResp.SearchResult(entity={color=red_9392, likes=58}, score=-0.24996188, id=6)

import (
    "context"
    "log"

    "github.com/milvus-io/milvus/client/v2"
    "github.com/milvus-io/milvus/client/v2/entity"
)

func ExampleClient_Search_filter() {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()

        milvusAddr := "127.0.0.1:19530"
        token := "root:Milvus"

        cli, err := client.New(ctx, &client.ClientConfig{
                Address: milvusAddr,
                APIKey:  token,
        })
        if err != nil {
                log.Fatal("failed to connect to milvus server: ", err.Error())
        }

        defer cli.Close(ctx)

        queryVector := []float32{0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592}

        resultSets, err := cli.Search(ctx, client.NewSearchOption(
                "filtered_search_collection", // collectionName
                3,             // limit
                []entity.Vector{entity.FloatVector(queryVector)},
        ).WithFilter(`color like "red%" and likes > 50`).WithOutputFields("color", "likes"))
        if err != nil {
                log.Fatal("failed to perform basic ANN search collection: ", err.Error())
        }

        for _, resultSet := range resultSets {
                log.Println("IDs: ", resultSet.IDs)
                log.Println("Scores: ", resultSet.Scores)
        }
        // Output:
        // IDs:
        // Scores:
}


import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";

const address = "http://localhost:19530";
const token = "root:Milvus";
const client = new MilvusClient({address, token});

const query_vector = [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]

const res = await client.search({
    collection_name: "filtered_search_collection",
    data: [query_vector],
    limit: 5,
    // highlight-start
    filters: 'color like "red%" and likes > 50',
    output_fields: ["color", "likes"]
    // highlight-end
})

export CLUSTER_ENDPOINT="http://localhost:19530"
export TOKEN="root:Milvus"

curl --request POST \
--url "${CLUSTER_ENDPOINT}/v2/vectordb/entities/search" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
    "collectionName": "quick_setup",
    "data": [
        [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]
    ],
    "annsField": "vector",
    "filter": "color like \"red%\" and likes > 50",
    "limit": 3,
    "outputFields": ["color", "likes"]
}'
# {"code":0,"cost":0,"data":[]}

搜尋請求中攜帶的篩選條件讀取color like "red%" and likes > 50 。它使用 and 運算符包含兩個條件:第一個條件要求在color 欄位中尋找值以red 開頭的實體,另一個條件要求在likes 欄位中尋找值大於50 的實體。只有兩個實體符合這些要求。當 top-K 設定為3 時,Milvus 會計算這兩個實體與查詢向量之間的距離,並將其作為搜尋結果傳回。

[
    {
        "id": 4, 
        "distance": 0.3345786594834839,
        "entity": {
            "vector": [0.4452349528804562, -0.8757026943054742, 0.8220779437047674, 0.46406290649483184, 0.30337481143159106], 
            "color": "red_4794", 
            "likes": 122
        }
    },
    {
        "id": 6, 
        "distance": 0.6638239834383389"entity": {
            "vector": [0.8371977790571115, -0.015764369584852833, -0.31062937026679327, -0.562666951622192, -0.8984947637863987], 
            "color": "red_9392", 
            "likes": 58
        }
    },
]

有關在元資料篩選中可以使用的運算符號的更多資訊,請參閱元資料篩選。

使用迭代篩選進行搜尋

若要使用迭代篩選進行篩選搜尋,您可以如下進行:

from pymilvus import MilvusClient

client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

query_vector = [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]

res = client.search(
    collection_name="my_collection",
    data=[query_vector],
    limit=5,
    # highlight-start
    filter='color like "red%" and likes > 50',
    output_fields=["color", "likes"],
    search_params={
        "hints": "iterative_filter"
    }    
    # highlight-end
)

for hits in res:
    print("TopK results:")
    for hit in hits:
        print(hit)

import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.service.vector.request.SearchReq;
import io.milvus.v2.service.vector.request.data.FloatVec;
import io.milvus.v2.service.vector.response.SearchResp;

MilvusClientV2 client = new MilvusClientV2(ConnectConfig.builder()
        .uri("http://localhost:19530")
        .token("root:Milvus")
        .build());

FloatVec queryVector = new FloatVec(new float[]{0.3580376395471989f, -0.6023495712049978f, 0.18414012509913835f, -0.26286205330961354f, 0.9029438446296592f});
SearchReq searchReq = SearchReq.builder()
        .collectionName("filtered_search_collection")
        .data(Collections.singletonList(queryVector))
        .topK(5)
        .filter("color like \"red%\" and likes > 50")
        .outputFields(Arrays.asList("color", "likes"))
        .searchParams(new HashMap<>("hints", "iterative_filter"))
        .build();

SearchResp searchResp = client.search(searchReq);

List<List<SearchResp.SearchResult>> searchResults = searchResp.getSearchResults();
for (List<SearchResp.SearchResult> results : searchResults) {
    System.out.println("TopK results:");
    for (SearchResp.SearchResult result : results) {
        System.out.println(result);
    }
}

// Output
// TopK results:
// SearchResp.SearchResult(entity={color=red_4794, likes=122}, score=0.5975797, id=4)
// SearchResp.SearchResult(entity={color=red_9392, likes=58}, score=-0.24996188, id=6)

import (
    "context"
    "log"

    "github.com/milvus-io/milvus/client/v2"
    "github.com/milvus-io/milvus/client/v2/entity"
)

func ExampleClient_Search_filter() {
        ctx, cancel := context.WithCancel(context.Background())
        defer cancel()

        milvusAddr := "127.0.0.1:19530"
        token := "root:Milvus"

        cli, err := client.New(ctx, &client.ClientConfig{
                Address: milvusAddr,
                APIKey:  token,
        })
        if err != nil {
                log.Fatal("failed to connect to milvus server: ", err.Error())
        }

        defer cli.Close(ctx)

        queryVector := []float32{0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592}

        resultSets, err := cli.Search(ctx, client.NewSearchOption(
                "filtered_search_collection", // collectionName
                3,             // limit
                []entity.Vector{entity.FloatVector(queryVector)},
        ).WithFilter(`color like "red%" and likes > 50`).WithHints("iterative_filter").WithOutputFields("color", "likes"))
        if err != nil {
                log.Fatal("failed to perform basic ANN search collection: ", err.Error())
        }

        for _, resultSet := range resultSets {
                log.Println("IDs: ", resultSet.IDs)
                log.Println("Scores: ", resultSet.Scores)
        }
        // Output:
        // IDs:
        // Scores:
}


import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";

const address = "http://localhost:19530";
const token = "root:Milvus";
const client = new MilvusClient({address, token});

const query_vector = [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]

const res = await client.search({
    collection_name: "filtered_search_collection",
    data: [query_vector],
    limit: 5,
    // highlight-start
    filters: 'color like "red%" and likes > 50',
    hints: "iterative_filter",
    output_fields: ["color", "likes"]
    // highlight-end
})

export CLUSTER_ENDPOINT="http://localhost:19530"
export TOKEN="root:Milvus"

curl --request POST \
--url "${CLUSTER_ENDPOINT}/v2/vectordb/entities/search" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
    "collectionName": "quick_setup",
    "data": [
        [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]
    ],
    "annsField": "vector",
    "filter": "color like \"red%\" and likes > 50",
    "searchParams": {"hints": "iterative_filter"},
    "limit": 3,
    "outputFields": ["color", "likes"]
}'
# {"code":0,"cost":0,"data":[]}

免費嘗試托管的 Milvus

Zilliz Cloud 無縫接入,由 Milvus 提供動力,速度提升 10 倍。

開始使用
反饋

這個頁面有幫助嗎?