Array of StructsCompatible with Milvus 2.6.4+
Bidang Array of Structs dalam sebuah entitas menyimpan sekumpulan elemen Struct yang terurut. Setiap Struct dalam Array memiliki skema yang telah ditentukan sebelumnya, yang terdiri dari beberapa vektor dan bidang skalar.
Berikut ini contoh entitas dari koleksi yang berisi bidang Array of Structs.
{
'id': 0,
'title': 'Walden',
'title_vector': [0.1, 0.2, 0.3, 0.4, 0.5],
'author': 'Henry David Thoreau',
'year_of_publication': 1845,
'chunks': [
{
'text': 'When I wrote the following pages, or rather the bulk of them...',
'text_vector': [0.3, 0.2, 0.3, 0.2, 0.5],
'chapter': 'Economy',
},
{
'text': 'I would fain say something, not so much concerning the Chinese and...',
'text_vector': [0.7, 0.4, 0.2, 0.7, 0.8],
'chapter': 'Economy'
}
]
// hightlight-end
}
Pada contoh di atas, bidang chunks adalah bidang Array of Structs, dan setiap elemen Struct berisi bidangnya sendiri, yaitu text, text_vector, dan chapter.
Batasan
Tipe data
Saat Anda membuat koleksi, Anda dapat menggunakan tipe Struct sebagai tipe data untuk elemen dalam bidang Array. Namun, Anda tidak dapat menambahkan Array of Structs ke koleksi yang sudah ada, dan Milvus tidak mendukung penggunaan tipe Struct sebagai tipe data untuk bidang koleksi.
Struktur dalam sebuah field Array memiliki skema yang sama, yang harus didefinisikan saat Anda membuat field Array.
Skema Struct berisi vektor dan bidang skalar, seperti yang tercantum dalam tabel berikut:
Tipe Bidang
Tipe Data
Vektor
FLOAT_VECTORSkalar
VARCHARINT8/16/32/64FLOATDOUBLEBOOLEANJaga agar jumlah bidang vektor baik di tingkat koleksi maupun di dalam gabungan Struktur tidak lebih dari atau sama dengan 10.
Dapat dinolkan & nilai default
Bidang Array of Structs tidak dapat dinolkan dan tidak menerima nilai default apa pun.
Fungsi
Anda tidak dapat menggunakan fungsi untuk mendapatkan bidang vektor dari bidang skalar dalam Struct.
Jenis indeks & jenis metrik
Semua bidang vektor dalam koleksi harus diindeks. Untuk mengindeks bidang vektor dalam larik bidang Struct, Milvus menggunakan daftar penyematan untuk mengatur penyematan vektor dalam setiap elemen Struct dan mengindeks seluruh daftar penyematan secara keseluruhan.
Anda dapat menggunakan
AUTOINDEXatauHNSWsebagai jenis indeks dan jenis metrik apa pun yang tercantum di bawah ini untuk membuat indeks untuk daftar penyematan dalam bidang Array of Structs.Jenis indeks
Jenis metrik
Keterangan
AUTOINDEX(atauHNSW)MAX_SIM_COSINEUntuk daftar penyematan dengan tipe berikut ini:
- FLOAT_VECTOR
MAX_SIM_IPMAX_SIM_L2Bidang skalar dalam bidang Array of Structs tidak mendukung indeks.
Memasukkan data
Structs tidak mendukung upsert dalam mode penggabungan. Namun, Anda masih bisa melakukan upsert dalam mode timpa untuk memperbarui data di Structs. Untuk detail mengenai perbedaan antara upsert dalam mode penggabungan dan mode timpa, lihat Upsert Entitas.
Pemfilteran skalar
Anda tidak dapat menggunakan Array of Structs atau bidang apa pun di dalam elemen Struct dalam memfilter ekspresi di dalam pencarian dan kueri.
Menambahkan Larik Struktur
Untuk menggunakan Array of Structs di Milvus, Anda perlu mendefinisikan bidang array ketika membuat koleksi, dan mengatur tipe data untuk elemen-elemennya ke Struct. Prosesnya adalah sebagai berikut:
Tetapkan tipe data sebuah field ke
DataType.ARRAYketika menambahkan field sebagai field Array ke skema koleksi.Tetapkan atribut
element_typefield keDataType.STRUCTuntuk menjadikan field sebagai Array dari Struktur.Buat skema Struktur dan sertakan bidang yang diperlukan. Lalu, rujuk skema Struktur di atribut
struct_schemabidang.Atur atribut
max_capacitybidang ke nilai yang sesuai untuk menentukan jumlah maksimum Struktur yang dapat ditampung oleh setiap entitas dalam bidang ini.(Opsional) Anda bisa menetapkan
mmap.enableduntuk bidang apa pun dalam elemen Struct untuk menyeimbangkan data panas dan data dingin dalam Struct.
Berikut ini cara mendefinisikan skema koleksi yang menyertakan Array of Structs:
from pymilvus import MilvusClient, DataType
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus"
)
schema = client.create_schema()
# add the primary field to the collection
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True, auto_id=True)
# add some scalar fields to the collection
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="author", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="year_of_publication", datatype=DataType.INT64)
# add a vector field to the collection
schema.add_field(field_name="title_vector", datatype=DataType.FLOAT_VECTOR, dim=5)
# Create a struct schema
struct_schema = client.create_struct_field_schema()
# add a scalar field to the struct
struct_schema.add_field("text", DataType.VARCHAR, max_length=65535)
struct_schema.add_field("chapter", DataType.VARCHAR, max_length=512)
# add a vector field to the struct with mmap enabled
struct_schema.add_field("text_vector", DataType.FLOAT_VECTOR, mmap_enabled=True, dim=5)
# reference the struct schema in an Array field with its
# element type set to `DataType.STRUCT`
schema.add_field("chunks", datatype=DataType.ARRAY, element_type=DataType.STRUCT,
struct_schema=struct_schema, max_capacity=1000)
import io.milvus.v2.common.DataType;
import io.milvus.v2.service.collection.request.AddFieldReq;
import io.milvus.v2.service.collection.request.CreateCollectionReq;
CreateCollectionReq.CollectionSchema collectionSchema = CreateCollectionReq.CollectionSchema.builder()
.build();
collectionSchema.addField(AddFieldReq.builder()
.fieldName("id")
.dataType(DataType.Int64)
.isPrimaryKey(true)
.autoID(true)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("title")
.dataType(DataType.VarChar)
.maxLength(512)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("author")
.dataType(DataType.VarChar)
.maxLength(512)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("year_of_publication")
.dataType(DataType.Int64)
.build());
collectionSchema.addField(AddFieldReq.builder()
.fieldName("title_vector")
.dataType(DataType.FloatVector)
.dimension(5)
.build());
Map<String, String> params = new HashMap<>();
params.put("mmap_enabled", "true");
collectionSchema.addField(AddFieldReq.builder()
.fieldName("chunks")
.dataType(DataType.Array)
.elementType(DataType.Struct)
.maxCapacity(1000)
.addStructField(AddFieldReq.builder()
.fieldName("text")
.dataType(DataType.VarChar)
.maxLength(65535)
.build())
.addStructField(AddFieldReq.builder()
.fieldName("chapter")
.dataType(DataType.VarChar)
.maxLength(512)
.build())
.addStructField(AddFieldReq.builder()
.fieldName("text_vector")
.dataType(DataType.FloatVector)
.dimension(VECTOR_DIM)
.typeParams(params)
.build())
.build());
// go
import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";
const milvusClient = new MilvusClient("http://localhost:19530");
const schema = [
{
name: "id",
data_type: DataType.INT64,
is_primary_key: true,
auto_id: true,
},
{
name: "title",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "author",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "year_of_publication",
data_type: DataType.INT64,
},
{
name: "title_vector",
data_type: DataType.FLOAT_VECTOR,
dim: 5,
},
{
name: "chunks",
data_type: DataType.ARRAY,
element_type: DataType.STRUCT,
fields: [
{
name: "text",
data_type: DataType.VARCHAR,
max_length: 65535,
},
{
name: "chapter",
data_type: DataType.VARCHAR,
max_length: 512,
},
{
name: "text_vector",
data_type: DataType.FLOAT_VECTOR,
dim: 5,
mmap_enabled: true,
},
],
max_capacity: 1000,
},
];
# restful
SCHEMA='{
"autoID": true,
"fields": [
{
"fieldName": "id",
"dataType": "Int64",
"isPrimary": true
},
{
"fieldName": "title",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "author",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "year_of_publication",
"dataType": "Int64"
},
{
"fieldName": "title_vector",
"dataType": "FloatVector",
"elementTypeParams": { "dim": "5" }
}
],
"structArrayFields": [
{
"name": "chunks",
"description": "Array of document chunks with text and vectors",
"elementTypeParams":{
"max_capacity": 1000
},
"fields": [
{
"fieldName": "text",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "65535" }
},
{
"fieldName": "chapter",
"dataType": "VarChar",
"elementTypeParams": { "max_length": "512" }
},
{
"fieldName": "text_vector",
"dataType": "FloatVector",
"elementTypeParams": {
"dim": "5",
"mmap_enabled": "true"
}
}
]
}
]
}'
Baris yang disorot pada contoh kode di atas mengilustrasikan prosedur untuk menyertakan Array of Structs dalam skema koleksi.
Tetapkan parameter indeks
Pengindeksan wajib dilakukan untuk semua bidang vektor, termasuk bidang vektor di dalam koleksi dan bidang vektor yang didefinisikan di dalam elemen Struct.
Parameter indeks yang berlaku bervariasi, tergantung jenis indeks yang digunakan. Untuk detail tentang parameter indeks yang berlaku, lihat Penjelasan Indeks dan halaman dokumentasi khusus untuk jenis indeks yang Anda pilih.
Untuk mengindeks daftar sematan, Anda perlu menyetel jenis indeksnya ke AUTOINDEX atau HNSW, dan menggunakan MAX_SIM_COSINE sebagai jenis metrik untuk Milvus untuk mengukur kemiripan di antara daftar sematan.
# Create index parameters
index_params = client.prepare_index_params()
# Create an index for the vector field in the collection
index_params.add_index(
field_name="title_vector",
index_type="AUTOINDEX",
metric_type="L2",
)
# Create an index for the vector field in the element Struct
index_params.add_index(
field_name="chunks[text_vector]",
index_type="AUTOINDEX",
metric_type="MAX_SIM_COSINE",
)
import io.milvus.v2.common.IndexParam;
List<IndexParam> indexParams = new ArrayList<>();
indexParams.add(IndexParam.builder()
.fieldName("title_vector")
.indexType(IndexParam.IndexType.AUTOINDEX)
.metricType(IndexParam.MetricType.L2)
.build());
indexParams.add(IndexParam.builder()
.fieldName("chunks[text_vector]")
.indexType(IndexParam.IndexType.AUTOINDEX)
.metricType(IndexParam.MetricType.MAX_SIM_COSINE)
.build());
// go
await milvusClient.createCollection({
collection_name: "books",
fields: schema,
});
const indexParams = [
{
field_name: "title_vector",
index_type: "AUTOINDEX",
metric_type: "L2",
},
{
field_name: "chunks[text_vector]",
index_type: "AUTOINDEX",
metric_type: "MAX_SIM_COSINE",
},
];
# restful
INDEX_PARAMS='[
{
"fieldName": "title_vector",
"indexName": "title_vector_index",
"indexType": "AUTOINDEX",
"metricType": "L2"
},
{
"fieldName": "chunks[text_vector]",
"indexName": "chunks_text_vector_index",
"indexType": "AUTOINDEX",
"metricType": "MAX_SIM_COSINE"
}
]'
Membuat koleksi
Setelah skema dan indeks siap, Anda dapat membuat koleksi yang menyertakan bidang Array of Structs.
client.create_collection(
collection_name="my_collection",
schema=schema,
index_params=index_params
)
import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.service.collection.request.CreateCollectionReq;
MilvusClientV2 client = new MilvusClientV2(ConnectConfig.builder()
.uri("http://localhost:19530")
.token("root:Milvus")
.build());
CreateCollectionReq requestCreate = CreateCollectionReq.builder()
.collectionName("my_collection")
.collectionSchema(collectionSchema)
.indexParams(indexParams)
.build();
client.createCollection(requestCreate);
// go
await milvusClient.createCollection({
collection_name: "books",
fields: schema,
indexes: indexParams,
});
# restful
curl -X POST "http://localhost:19530/v2/vectordb/collections/create" \
-H "Content-Type: application/json" \
-d "{
\"collectionName\": \"my_collection\",
\"description\": \"A collection for storing book information with struct array chunks\",
\"schema\": $SCHEMA,
\"indexParams\": $INDEX_PARAMS
}"
Menyisipkan data
Setelah membuat koleksi, Anda dapat menyisipkan data yang menyertakan Array of Structs sebagai berikut.
# Sample data
data = {
'title': 'Walden',
'title_vector': [0.1, 0.2, 0.3, 0.4, 0.5],
'author': 'Henry David Thoreau',
'year_of_publication': 1845,
'chunks': [
{
'text': 'When I wrote the following pages, or rather the bulk of them...',
'text_vector': [0.3, 0.2, 0.3, 0.2, 0.5],
'chapter': 'Economy',
},
{
'text': 'I would fain say something, not so much concerning the Chinese and...',
'text_vector': [0.7, 0.4, 0.2, 0.7, 0.8],
'chapter': 'Economy'
}
]
}
# insert data
client.insert(
collection_name="my_collection",
data=[data]
)
import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import io.milvus.v2.service.vector.request.InsertReq;
import io.milvus.v2.service.vector.response.InsertResp;
Gson gson = new Gson();
JsonObject row = new JsonObject();
row.addProperty("title", "Walden");
row.add("title_vector", gson.toJsonTree(Arrays.asList(0.1f, 0.2f, 0.3f, 0.4f, 0.5f)));
row.addProperty("author", "Henry David Thoreau");
row.addProperty("year_of_publication", 1845);
JsonArray structArr = new JsonArray();
JsonObject struct1 = new JsonObject();
struct1.addProperty("text", "When I wrote the following pages, or rather the bulk of them...");
struct1.add("text_vector", gson.toJsonTree(Arrays.asList(0.3f, 0.2f, 0.3f, 0.2f, 0.5f)));
struct1.addProperty("chapter", "Economy");
structArr.add(struct1);
JsonObject struct2 = new JsonObject();
struct2.addProperty("text", "I would fain say something, not so much concerning the Chinese and...");
struct2.add("text_vector", gson.toJsonTree(Arrays.asList(0.7f, 0.4f, 0.2f, 0.7f, 0.8f)));
struct2.addProperty("chapter", "Economy");
structArr.add(struct2);
row.add("chunks", structArr);
InsertResp insertResp = client.insert(InsertReq.builder()
.collectionName("my_collection")
.data(Collections.singletonList(row))
.build());
// go
{
id: 0,
title: "Walden",
title_vector: [0.1, 0.2, 0.3, 0.4, 0.5],
author: "Henry David Thoreau",
"year-of-publication": 1845,
chunks: [
{
text: "When I wrote the following pages, or rather the bulk of them...",
text_vector: [0.3, 0.2, 0.3, 0.2, 0.5],
chapter: "Economy",
},
{
text: "I would fain say something, not so much concerning the Chinese and...",
text_vector: [0.7, 0.4, 0.2, 0.7, 0.8],
chapter: "Economy",
},
],
},
];
await milvusClient.insert({
collection_name: "books",
data: data,
});
# restful
curl -X POST "http://localhost:19530/v2/vectordb/entities/insert" \
-H "Content-Type: application/json" \
-d '{
"collectionName": "my_collection",
"data": [
{
"title": "Walden",
"title_vector": [0.1, 0.2, 0.3, 0.4, 0.5],
"author": "Henry David Thoreau",
"year_of_publication": 1845,
"chunks": [
{
"text": "When I wrote the following pages, or rather the bulk of them...",
"text_vector": [0.3, 0.2, 0.3, 0.2, 0.5],
"chapter": "Economy"
},
{
"text": "I would fain say something, not so much concerning the Chinese and...",
"text_vector": [0.7, 0.4, 0.2, 0.7, 0.8],
"chapter": "Economy"
}
]
}
]
}'
import json
import random
from typing import List, Dict, Any
# Real classic books (title, author, year)
BOOKS = [
("Pride and Prejudice", "Jane Austen", 1813),
("Moby Dick", "Herman Melville", 1851),
("Frankenstein", "Mary Shelley", 1818),
("The Picture of Dorian Gray", "Oscar Wilde", 1890),
("Dracula", "Bram Stoker", 1897),
("The Adventures of Sherlock Holmes", "Arthur Conan Doyle", 1892),
("Alice's Adventures in Wonderland", "Lewis Carroll", 1865),
("The Time Machine", "H.G. Wells", 1895),
("The Scarlet Letter", "Nathaniel Hawthorne", 1850),
("Leaves of Grass", "Walt Whitman", 1855),
("The Brothers Karamazov", "Fyodor Dostoevsky", 1880),
("Crime and Punishment", "Fyodor Dostoevsky", 1866),
("Anna Karenina", "Leo Tolstoy", 1877),
("War and Peace", "Leo Tolstoy", 1869),
("Great Expectations", "Charles Dickens", 1861),
("Oliver Twist", "Charles Dickens", 1837),
("Wuthering Heights", "Emily Brontë", 1847),
("Jane Eyre", "Charlotte Brontë", 1847),
("The Call of the Wild", "Jack London", 1903),
("The Jungle Book", "Rudyard Kipling", 1894),
]
# Common chapter names for classics
CHAPTERS = [
"Introduction", "Prologue", "Chapter I", "Chapter II", "Chapter III",
"Chapter IV", "Chapter V", "Chapter VI", "Chapter VII", "Chapter VIII",
"Chapter IX", "Chapter X", "Epilogue", "Conclusion", "Afterword",
"Economy", "Where I Lived", "Reading", "Sounds", "Solitude",
"Visitors", "The Bean-Field", "The Village", "The Ponds", "Baker Farm"
]
# Placeholder text snippets (mimicking 19th-century prose)
TEXT_SNIPPETS = [
"When I wrote the following pages, or rather the bulk of them...",
"I would fain say something, not so much concerning the Chinese and...",
"It is a truth universally acknowledged, that a single man in possession...",
"Call me Ishmael. Some years ago—never mind how long precisely...",
"It was the best of times, it was the worst of times...",
"All happy families are alike; each unhappy family is unhappy in its own way.",
"Whether I shall turn out to be the hero of my own life, or whether that station...",
"You will rejoice to hear that no disaster has accompanied the commencement...",
"The world is too much with us; late and soon, getting and spending...",
"He was an old man who fished alone in a skiff in the Gulf Stream..."
]
def random_vector() -> List[float]:
return [round(random.random(), 1) for _ in range(5)]
def generate_chunk() -> Dict[str, Any]:
return {
"text": random.choice(TEXT_SNIPPETS),
"text_vector": random_vector(),
"chapter": random.choice(CHAPTERS)
}
def generate_record(record_id: int) -> Dict[str, Any]:
title, author, year = random.choice(BOOKS)
num_chunks = random.randint(1, 5) # 1 to 5 chunks per book
chunks = [generate_chunk() for _ in range(num_chunks)]
return {
"title": title,
"title_vector": random_vector(),
"author": author,
"year_of_publication": year,
"chunks": chunks
}
# Generate 1000 records
data = [generate_record(i) for i in range(1000)]
# Insert the generated data
client.insert(collection_name="my_collection", data=data)
Pencarian vektor terhadap bidang Array of Structs
Anda dapat melakukan pencarian vektor pada bidang vektor dari koleksi dan Array of Structs.
Secara khusus, Anda harus menggabungkan nama bidang Array of Structs dan nama bidang vektor target di dalam elemen Struct sebagai nilai untuk parameter anns_field di dalam permintaan pencarian, dan menggunakan EmbeddingList untuk mengatur vektor kueri dengan rapi.
Milvus menyediakan EmbeddingList untuk membantu Anda mengatur vektor kueri untuk pencarian terhadap daftar sematan dalam Array of Structs dengan lebih rapi. Setiap EmbeddingList berisi setidaknya sebuah vektor embedding dan mengharapkan sejumlah entitas topK sebagai balasannya.
Namun, EmbeddingList hanya dapat digunakan dalam permintaan search() tanpa pencarian rentang atau pengelompokan parameter pencarian, apalagi permintaan search_iterator().
from pymilvus.client.embedding_list import EmbeddingList
# each query embedding list triggers a single search
embeddingList1 = EmbeddingList()
embeddingList1.add([0.2, 0.9, 0.4, -0.3, 0.2])
embeddingList2 = EmbeddingList()
embeddingList2.add([-0.2, -0.2, 0.5, 0.6, 0.9])
embeddingList2.add([-0.4, 0.3, 0.5, 0.8, 0.2])
# a search with a single embedding list
results = client.search(
collection_name="my_collection",
data=[ embeddingList1 ],
anns_field="chunks[text_vector]",
search_params={"metric_type": "MAX_SIM_COSINE"},
limit=3,
output_fields=["chunks[text]"]
)
import io.milvus.v2.service.vector.request.data.EmbeddingList;
import io.milvus.v2.service.vector.request.data.FloatVec;
EmbeddingList embeddingList1 = new EmbeddingList();
embeddingList1.add(new FloatVec(new float[]{0.2f, 0.9f, 0.4f, -0.3f, 0.2f}));
EmbeddingList embeddingList2 = new EmbeddingList();
embeddingList2.add(new FloatVec(new float[]{-0.2f, -0.2f, 0.5f, 0.6f, 0.9f}));
embeddingList2.add(new FloatVec(new float[]{-0.4f, 0.3f, 0.5f, 0.8f, 0.2f}));
Map<String, Object> params = new HashMap<>();
params.put("metric_type", "MAX_SIM_COSINE");
SearchResp searchResp = client.search(SearchReq.builder()
.collectionName("my_collection")
.annsField("chunks[text_vector]")
.data(Collections.singletonList(embeddingList1))
.searchParams(params)
.limit(3)
.outputFields(Collections.singletonList("chunks[text]"))
.build());
// go
const embeddingList1 = [[0.2, 0.9, 0.4, -0.3, 0.2]];
const embeddingList2 = [
[-0.2, -0.2, 0.5, 0.6, 0.9],
[-0.4, 0.3, 0.5, 0.8, 0.2],
];
const results = await milvusClient.search({
collection_name: "books",
data: embeddingList1,
anns_field: "chunks[text_vector]",
search_params: { metric_type: "MAX_SIM_COSINE" },
limit: 3,
output_fields: ["chunks[text]"],
});
# restful
embeddingList1='[[0.2,0.9,0.4,-0.3,0.2]]'
embeddingList2='[[-0.2,-0.2,0.5,0.6,0.9],[-0.4,0.3,0.5,0.8,0.2]]'
curl -X POST "http://localhost:19530/v2/vectordb/entities/search" \
-H "Content-Type: application/json" \
-d "{
\"collectionName\": \"my_collection\",
\"data\": [$embeddingList1],
\"annsField\": \"chunks[text_vector]\",
\"searchParams\": {\"metric_type\": \"MAX_SIM_COSINE\"},
\"limit\": 3,
\"outputFields\": [\"chunks[text]\"]
}"
Permintaan pencarian di atas menggunakan chunks[text_vector] untuk merujuk ke bidang text_vector dalam elemen Struct. Anda dapat menggunakan sintaks ini untuk mengatur parameter anns_field dan output_fields.
Keluarannya adalah daftar tiga entitas yang paling mirip.
# [
# [
# {
# 'id': 461417939772144945,
# 'distance': 0.9675756096839905,
# 'entity': {
# 'chunks': [
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'All happy families are alike; each unhappy family is unhappy in its own way.'}
# ]
# }
# },
# {
# 'id': 461417939772144965,
# 'distance': 0.9555778503417969,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'When I wrote the following pages, or rather the bulk of them...'},
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# },
# {
# 'id': 461417939772144962,
# 'distance': 0.9469035863876343,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# }
# ]
# ]
Anda juga dapat menyertakan beberapa daftar penyematan dalam parameter data untuk mengambil hasil pencarian untuk setiap daftar penyematan.
# a search with multiple embedding lists
results = client.search(
collection_name="my_collection",
data=[ embeddingList1, embeddingList2 ],
anns_field="chunks[text_vector]",
search_params={"metric_type": "MAX_SIM_COSINE"},
limit=3,
output_fields=["chunks[text]"]
)
print(results)
Map<String, Object> params = new HashMap<>();
params.put("metric_type", "MAX_SIM_COSINE");
SearchResp searchResp = client.search(SearchReq.builder()
.collectionName("my_collection")
.annsField("chunks[text_vector]")
.data(Arrays.asList(embeddingList1, embeddingList2))
.searchParams(params)
.limit(3)
.outputFields(Collections.singletonList("chunks[text]"))
.build());
List<List<SearchResp.SearchResult>> searchResults = searchResp.getSearchResults();
for (int i = 0; i < searchResults.size(); i++) {
System.out.println("Results of No." + i + " embedding list");
List<SearchResp.SearchResult> results = searchResults.get(i);
for (SearchResp.SearchResult result : results) {
System.out.println(result);
}
}
// go
const results2 = await milvusClient.search({
collection_name: "books",
data: [embeddingList1, embeddingList2],
anns_field: "chunks[text_vector]",
search_params: { metric_type: "MAX_SIM_COSINE" },
limit: 3,
output_fields: ["chunks[text]"],
});
# restful
curl -X POST "http://localhost:19530/v2/vectordb/entities/search" \
-H "Content-Type: application/json" \
-d "{
\"collectionName\": \"my_collection\",
\"data\": [$embeddingList1, $embeddingList2],
\"annsField\": \"chunks[text_vector]\",
\"searchParams\": {\"metric_type\": \"MAX_SIM_COSINE\"},
\"limit\": 3,
\"outputFields\": [\"chunks[text]\"]
}"
Keluarannya adalah daftar tiga entitas yang paling mirip untuk setiap daftar sematan.
# [
# [
# {
# 'id': 461417939772144945,
# 'distance': 0.9675756096839905,
# 'entity': {
# 'chunks': [
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'All happy families are alike; each unhappy family is unhappy in its own way.'}
# ]
# }
# },
# {
# 'id': 461417939772144965,
# 'distance': 0.9555778503417969,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'When I wrote the following pages, or rather the bulk of them...'},
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# },
# {
# 'id': 461417939772144962,
# 'distance': 0.9469035863876343,
# 'entity': {
# 'chunks': [
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'},
# {'text': 'The world is too much with us; late and soon, getting and spending...'}
# ]
# }
# }
# ],
# [
# {
# 'id': 461417939772144663,
# 'distance': 1.9761409759521484,
# 'entity': {
# 'chunks': [
# {'text': 'It was the best of times, it was the worst of times...'},
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'},
# {'text': 'Whether I shall turn out to be the hero of my own life, or whether that station...'},
# {'text': 'He was an old man who fished alone in a skiff in the Gulf Stream...'}
# ]
# }
# },
# {
# 'id': 461417939772144692,
# 'distance': 1.974656581878662,
# 'entity': {
# 'chunks': [
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'},
# {'text': 'Call me Ishmael. Some years ago—never mind how long precisely...'}
# ]
# }
# },
# {
# 'id': 461417939772144662,
# 'distance': 1.9406685829162598,
# 'entity': {
# 'chunks': [
# {'text': 'It is a truth universally acknowledged, that a single man in possession...'}
# ]
# }
# }
# ]
# ]
Pada contoh kode di atas, embeddingList1 adalah daftar sematan satu vektor, sedangkan embeddingList2 berisi dua vektor. Masing-masing memicu permintaan pencarian terpisah dan mengharapkan daftar entitas yang paling mirip.
Langkah selanjutnya
Pengembangan tipe data Array of Structs merupakan kemajuan besar dalam kemampuan Milvus untuk menangani struktur data yang kompleks. Untuk lebih memahami kasus penggunaannya dan memaksimalkan fitur baru ini, Anda dianjurkan untuk membaca Desain Skema Menggunakan Array of Structs.