Conduct a Vector Query
This topic describes how to conduct a vector query.
Unlike a vector similarity search, a vector query retrieves vectors via scalar filtering based on boolean expression. Milvus supports many data types in the scalar fields and a variety of boolean expressions. The boolean expression filters on scalar fields or the primary key field, and it retrieves all results that match the filters.
The following example shows how to perform a vector query on a 2000-row dataset of book ID (primary key), word count (scalar field), and book introduction (vector field), simulating the situation where you query for certain books based on their IDs.
Preparations
The following example code demonstrates the steps prior to a query.
If you work with your own dataset in an existing Milvus server, you can move forward to the next step.
- Connect to the Milvus server. See Manage Connection for more instruction.
from pymilvus import connections
connections.connect("default", host='localhost', port='19530')
const { MilvusClient } =require("@zilliz/milvus2-sdk-node");
const milvusClient = new MilvusClient("localhost:19530");
connect -h localhost -p 19530 -a default
- Create a collection. See Create a Collection for more instruction.
schema = CollectionSchema([
FieldSchema("book_id", DataType.INT64, is_primary=True),
FieldSchema("word_count", DataType.INT64),
FieldSchema("book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection("book", schema, using='default', shards_num=2)
const params = {
collection_name: "book",
fields: [
{
name: "book_intro",
description: "",
data_type: 101, // DataType.FloatVector
type_params: {
dim: "2",
},
},
{
name: "book_id",
data_type: 5, //DataType.Int64
is_primary_key: true,
description: "",
},
{
name: "word_count",
data_type: 5, //DataType.Int64
description: "",
},
],
};
await milvusClient.collectionManager.createCollection(params);
create collection -c book -f book_intro:FLOAT_VECTOR:2 -f book_id:INT64 book_id -f word_count:INT64 word_count -p book_id
- Insert data into the collection (Milvus CLI example uses a pre-built, remote CSV file containing similar data). See Insert Data for more instruction.
import random
data = [
[i for i in range(2000)],
[i for i in range(10000, 12000)],
[[random.random() for _ in range(2)] for _ in range(2000)],
]
collection.insert(data)
const data = Array.from({ length: 2000 }, (v,k) => ({
"book_intro": Array.from({ length: 2 }, () => Math.random()),
"book_id": k,
"word_count": k+10000,
}));
await milvusClient.dataManager.insert({
collection_name: "book",
fields_data: entities,
});
import -c book 'https://raw.githubusercontent.com/milvus-io/milvus_cli/main/examples/user_guide/search.csv'
- Create an index for the vector field. See Build Index for more instruction.
index_params = {
"metric_type":"L2",
"index_type":"IVF_FLAT",
"params":{"nlist":1024}
}
collection.create_index("book_intro", index_params=index_params)
const index_params = {
metric_type: "L2",
index_type: "IVF_FLAT",
params: JSON.stringify({ nlist: 1024 }),
};
await milvusClient.indexManager.createIndex({
collection_name: "book",
field_name: "book_intro",
extra_params: index_params,
});
create index
Collection name (book): book
The name of the field to create an index for (book_intro): book_intro
Index type (FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, RNSG, HNSW, ANNOY): IVF_FLAT
Index metric type (L2, IP, HAMMING, TANIMOTO): L2
Index params nlist: 1024
Timeout []:
Load collection
All CRUD operations within Milvus are executed in memory. Load the collection to memory before conducting a vector query.
from pymilvus import Collection
collection = Collection("book") # Get an existing collection.
collection.load()
await milvusClient.collectionManager.loadCollection({
collection_name: "book",
});
load -c book
Conduct a vector query
The following example filters the vectors with certain book_id
values, and returns the book_id
field and book_intro
of the results.
res = collection.query(expr = "book_id in [2,4,6,8]", output_fields = ["book_id", "book_intro"])
const results = await milvusClient.dataManager.query({
collection_name: "book",
expr: "book_id in [2,4,6,8]",
output_fields: ["book_id", "book_intro"],
});
query
collection_name: book
The query expression: book_id in [2,4,6,8]
Name of partitions that contain entities(split by "," if multiple) []:
A list of fields to return(split by "," if multiple) []: book_id, book_intro
timeout []:
Parameter | Description |
---|---|
expr |
Boolean expression used to filter attribute. Find more expression details in Boolean Expression Rules. |
output_fields (optional) |
List of names of the field to return. |
partition_names (optional) |
List of names of the partitions to query on. |
Parameter | Description |
---|---|
collection_name |
Name of the collection to query. |
expr |
Boolean expression used to filter attribute. Find more expression details in Boolean Expression Rules. |
output_fields (optional) |
List of names of the field to return. |
partition_names (optional) |
List of names of the partitions to query on. |
Option | Full name | Description |
---|---|---|
--help | n/a | Displays help for using the command. |
Check the returned results.
sorted_res = sorted(res, key=lambda k: k['book_id'])
sorted_res
console.log(results.data)
# Milvus CLI automatically returns the entities with the pre-defined output fields.
What's next
- Learn more basic operations of Milvus:
- Explore API references for Milvus SDKs: