🚀 Zilliz Cloudを無料で試す、完全管理型のMilvus—10倍の高速パフォーマンスを体験しよう！今すぐ試す>>

スター32.8K お問い合わせ

始める

フロントページへ

Milvusについて
スタート
コンセプト
ユーザーガイド
モデル
管理ガイド
ツール
統合
チュートリアル
よくあるご質問
API Reference

Home
Docs
ユーザーガイド
検索、問い合わせ、取得
イテレータ

イテレータ

Milvusは大量のエンティティを反復処理するための検索および問い合わせイテレータを提供しています。MilvusはTopKを16384に制限しているため、ユーザはイテレータを使用して、バッチモードでコレクション内の大量の、あるいはエンティティ全体を返すことができます。

概要

イテレータは、主キー値やフィルタ式を指定することで、コレクション全体をスキャンしたり、大量のエンティティを反復処理したりするための効率的なツールです。offsetパラメータやlimitパラメータを指定した検索コールやクエリコールと比較して、イテレータの使用はより効率的でスケーラブルです。

イテレータを使用するメリット

単純さ：複雑なオフセットや リミットの設定が不要になります。
効率性：必要なデータのみをフェッチすることで、スケーラブルなデータ検索を実現。
一貫性：ブーリアンフィルターにより、一貫したデータセットサイズを保証します。

注釈

この機能はMilvus 2.3.x以降で利用可能です。

準備

以下の準備ステップでは、Milvusに接続し、ランダムに生成されたエンティティをコレクションに挿入する。

ステップ1: コレクションの作成

以下の手順で MilvusClientを使用してMilvusサーバに接続し create_collection()を使用してコレクションを作成します。

コレクションを作成するには MilvusClientV2を使ってMilvusサーバに接続し createCollection()コレクションを作成する。

Python Java

from pymilvus import MilvusClient

# 1. Set up a Milvus client
client = MilvusClient(
    uri="http://localhost:19530"
)

# 2. Create a collection
client.create_collection(
    collection_name="quick_setup",
    dimension=5,
)

import com.google.gson.Gson;
import com.google.gson.JsonObject;
import io.milvus.orm.iterator.QueryIterator;
import io.milvus.orm.iterator.SearchIterator;
import io.milvus.response.QueryResultsWrapper;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.common.ConsistencyLevel;
import io.milvus.v2.common.IndexParam;
import io.milvus.v2.service.collection.request.CreateCollectionReq;
import io.milvus.v2.service.collection.request.DropCollectionReq;
import io.milvus.v2.service.vector.request.*;
import io.milvus.v2.service.vector.request.data.FloatVec;
import io.milvus.v2.service.vector.response.InsertResp;
import io.milvus.v2.service.vector.response.QueryResp;

import java.util.*;

String CLUSTER_ENDPOINT = "http://localhost:19530";

// 1. Connect to Milvus server
ConnectParam connectParam = ConnectParam.newBuilder()
        .withUri(CLUSTER_ENDPOINT)
        .build();

MilvusServiceClient client  = new MilvusServiceClient(connectParam);

// 2. Create a collection
CreateCollectionReq quickSetupReq = CreateCollectionReq.builder()
        .collectionName("quick_setup")
        .dimension(5)
        .build();
client.createCollection(quickSetupReq);

ステップ2: ランダムに生成されたエンティティの挿入

以下を使用する。 insert()を使ってエンティティをコレクションに挿入する。

コレクションにエンティティを挿入するには insert()を使って、エンティティをコレクションに挿入する。

Python Java

# 3. Insert randomly generated vectors 
colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"]
data = []

for i in range(10000):
    current_color = random.choice(colors)
    current_tag = random.randint(1000, 9999)
    data.append({
        "id": i,
        "vector": [ random.uniform(-1, 1) for _ in range(5) ],
        "color": current_color,
        "tag": current_tag,
        "color_tag": f"{current_color}_{str(current_tag)}"
    })

print(data[0])

# Output
#
# {
#     "id": 0,
#     "vector": [
#         -0.5705990742218152,
#         0.39844925120642083,
#         -0.8791287928610869,
#         0.024163154953680932,
#         0.6837669917169638
#     ],
#     "color": "purple",
#     "tag": 7774,
#     "color_tag": "purple_7774"
# }

res = client.insert(
    collection_name="quick_setup",
    data=data,
)

print(res)

# Output
#
# {
#     "insert_count": 10000,
#     "ids": [
#         0,
#         1,
#         2,
#         3,
#         4,
#         5,
#         6,
#         7,
#         8,
#         9,
#         "(9990 more items hidden)"
#     ]
# }

// 3. Insert randomly generated vectors into the collection
List<String> colors = Arrays.asList("green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey");
List<JsonObject> data = new ArrayList<>();
Gson gson = new Gson();
for (int i=0; i<10000; i++) {
    Random rand = new Random();
    String current_color = colors.get(rand.nextInt(colors.size()-1));
    JsonObject row = new JsonObject();
    row.addProperty("id", (long) i);
    row.add("vector", gson.toJsonTree(Arrays.asList(rand.nextFloat(), rand.nextFloat(), rand.nextFloat(), rand.nextFloat(), rand.nextFloat())));
    row.addProperty("color_tag", current_color + "_" + (rand.nextInt(8999) + 1000));
    data.add(row);
}

InsertResp insertR = client.insert(InsertReq.builder()
        .collectionName("quick_setup")
        .data(data)
        .build());
System.out.println(insertR.getInsertCnt());

// Output
// 10000

イテレータを使った検索

イテレータは類似検索をよりスケーラブルにします。

イテレータで検索するにはsearch_iterator()メソッドを呼び出します：

イテレータで検索するには、searchIterator()メソッドを呼び出します：

検索イテレータを初期化して、検索パラメータと出力フィールドを定義します。
検索結果をページ分割するには、ループ内でnext()メソッドを使用します。
- メソッドが空の配列を返した場合はループが終了し、それ以降のページは使用できなくなります。
- すべての結果は、指定した出力フィールドを保持します。
すべてのデータが取得されたら、手動でclose()メソッドを呼び出してイテレータを閉じます。

Python Java

from pymilvus import Collection,connections

# 4. Search with iterator
connections.connect(host="127.0.0.1", port=19530)
collection = Collection("quick_setup")

query_vectors = [[0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]]
search_params = {
    "metric_type": "IP",
    "params": {"nprobe": 10}
}

iterator = collection.search_iterator(
    data=query_vectors,
    anns_field="vector",
    batch_size=10,
    param=search_params,
    output_fields=["color_tag"],
    limit=300
)
# search 300 entities totally with 10 entities per page

results = []

while True:
    result = iterator.next()
    if not result:
        iterator.close()
        break
        
    results.extend(result)
    
    for hit in result:
        results.append(hit.to_dict())

print(results)

# Output
#
# [
#     {
#         "id": 1756,
#         "distance": 2.0642056465148926,
#         "entity": {
#             "color_tag": "black_9109"
#         }
#     },
#     {
#         "id": 6488,
#         "distance": 1.9437453746795654,
#         "entity": {
#             "color_tag": "purple_8164"
#         }
#     },
#     {
#         "id": 3338,
#         "distance": 1.9107104539871216,
#         "entity": {
#             "color_tag": "brown_8121"
#         }
#     }
# ]

// 4. Search with iterators
SearchIteratorReq iteratorReq = SearchIteratorReq.builder()
        .collectionName("quick_setup")
        .vectorFieldName("vector")
        .batchSize(10L)
        .vectors(Collections.singletonList(new FloatVec(Arrays.asList(0.3580376395471989f, -0.6023495712049978f, 0.18414012509913835f, -0.26286205330961354f, 0.9029438446296592f))))
        .params("{\"level\": 1}")
        .metricType(IndexParam.MetricType.COSINE)
        .outputFields(Collections.singletonList("color_tag"))
        .topK(300)
        .build();

SearchIterator searchIterator = client.searchIterator(iteratorReq);

List<QueryResultsWrapper.RowRecord> results = new ArrayList<>();
while (true) {
    List<QueryResultsWrapper.RowRecord> batchResults = searchIterator.next();
    if (batchResults.isEmpty()) {
        searchIterator.close();
        break;
    }

    results.addAll(batchResults);
}
System.out.println(results.size());

// Output
// 300

パラメータ	説明
`data`	Milvusは指定されたものに最も類似したベクトル埋め込みを検索します。
`anns_field`	現在のコレクション内のベクトルフィールドの名前。
`batch_size`	`next()` デフォルト値は1000です。適切な値に設定して、反復ごとに返すエンティティの数を制御します。
`param`	この操作に固有のパラメータ設定。 `metric_type`:この操作に適用されるメトリック・タイプ。これは、上記で指定したベクトル・フィールドにインデックスを付けるときに使用するものと同じでなければならない。指定可能な値は、L2、IP、COSINE、JACCARD、HAMMINGである。 `params`:追加パラメータ。詳細はsearch_iterator() を参照。
`output_fields`	デフォルト値はNone。指定しない場合は、プライマリ・フィールドのみが含まれます。
`limit`	デフォルト値は-1 で、一致するすべてのエンティティが返されます。

パラメータ	説明
`withCollectionName`	コレクション名を設定します。コレクション名は空または NULL にはできません。
`withVectorFieldName`	対象のベクトル・フィールドを名前で設定します。フィールド名は空または NULL にはできません。
`withVectors`	対象ベクターを設定します。最大 16384 ベクトルまで指定可能。
`withBatchSize`	`next()` デフォルト値は1000 です。適切な値に設定して、反復ごとに返すエンティティの数を制御します。
`withParams`	検索のパラメータを JSON 形式で指定します。詳細については、searchIterator() を参照してください。

イテレータを使用したクエリ

イテレータを使用してクエリを実行するには、query_iterator()メソッドを呼び出します：

イテレータで検索するには、queryIterator()メソッドをコールします：

Python Java

# 6. Query with iterator
iterator = collection.query_iterator(
    batch_size=10, # Controls the size of the return each time you call next()
    expr="color_tag like \"brown_8\"",
    output_fields=["color_tag"]
)

results = []

while True:
    result = iterator.next()
    if not result:
        iterator.close()
        break
        
    results.extend(result)
    
# 8. Check the search results
print(len(results))

print(results[:3])

# Output
#
# [
#     {
#         "color_tag": "brown_8785",
#         "id": 94
#     },
#     {
#         "color_tag": "brown_8568",
#         "id": 176
#     },
#     {
#         "color_tag": "brown_8721",
#         "id": 289
#     }
# ]

// 5. Query with iterators
QueryIterator queryIterator = client.queryIterator(QueryIteratorReq.builder()
        .collectionName("quick_setup")
        .expr("color_tag like \"brown_8%\"")
        .batchSize(50L)
        .outputFields(Arrays.asList("vector", "color_tag"))
        .build());

results.clear();
while (true) {
    List<QueryResultsWrapper.RowRecord> batchResults = queryIterator.next();
    if (batchResults.isEmpty()) {
        queryIterator.close();
        break;
    }

    results.addAll(batchResults);
}

System.out.println(results.subList(0, 3));

// Output
// [
//  [color_tag:brown_8975, vector:[0.93425006, 0.42161798, 0.1603949, 0.86406225, 0.30063087], id:104],
//  [color_tag:brown_8292, vector:[0.075261295, 0.51725155, 0.13842249, 0.13178307, 0.90713704], id:793],
//  [color_tag:brown_8763, vector:[0.80366623, 0.6534371, 0.6446101, 0.094082, 0.1318503], id:1157]
// ]

パラメータ	説明
`batch_size`	`next()` デフォルト値は1000です。適切な値に設定して、反復ごとに返すエンティティの数を制御します。
`expr`	デフォルト値はNone で、スカラー・フィルタリングが無視されることを示す。スカラー・フィルタリング条件を構築するには、「Boolean Expression Rules」を参照してください。
`output_fields`	デフォルト値はNone です。指定しないままにすると、プライマリ・フィールドのみが含まれます。
`limit`	値の既定値は-1 で、一致するすべてのエンティティが返されることを示します。

パラメータ	説明
`withCollectionName`	コレクション名を設定します。コレクション名は空または NULL にはできません。
`withExpr`	エンティティをクエリする式を設定します。スカラー・フィルタリング条件を構築するには、"Boolean Expression Rules" を参照してください。
`withBatchSize`	`next()` デフォルト値は1000 です。適切な値に設定して、反復ごとに返すエンティティの数を制御します。
`addOutField`	出力スカラー・フィールドを指定します（オプション）。

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

フィードバック

このページは役に立ちましたか？

イテレータ

概要

イテレータを使用するメリット

準備

ステップ1: コレクションの作成

ステップ2: ランダムに生成されたエンティティの挿入

イテレータを使った検索

イテレータを使用したクエリ

目次

Try Managed Milvus for Free

フィードバック