Conduct a Vector Similarity Search
This topic describes how to search entities with Milvus.
A vector similarity search in Milvus calculates the distance between query vector(s) and vectors in the collection with specified similarity metrics, and returns the most similar results. By specifying a boolean expression that filters the scalar field or the primary key field, you can perform a hybrid search or even a search with Time Travel.
The following example shows how to perform a vector similarity search on a 2000-row dataset of book ID (primary key), word count (scalar field), and book introduction (vector field), simulating the situation that you search for certain books based on their vectorized introductions. Milvus will return the most similar results according to the query vector and search parameters you have defined.
Preparations
The following example code demonstrates the steps prior to a search.
If you work with your own dataset in an existing Milvus instance, you can move forward to the next step.
- Connect to the Milvus server. See Manage Connection for more instruction.
from pymilvus import connections
connections.connect("default", host='localhost', port='19530')
const { MilvusClient } =require("@zilliz/milvus2-sdk-node");
const milvusClient = new MilvusClient("localhost:19530");
connect -h localhost -p 19530 -a default
- Create a collection. See Create a Collection for more instruction.
schema = CollectionSchema([
FieldSchema("book_id", DataType.INT64, is_primary=True),
FieldSchema("word_count", DataType.INT64),
FieldSchema("book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection("book", schema, using='default', shards_num=2)
const params = {
collection_name: "book",
fields: [
{
name: "book_intro",
description: "",
data_type: 101, // DataType.FloatVector
type_params: {
dim: "2",
},
},
{
name: "book_id",
data_type: 5, //DataType.Int64
is_primary_key: true,
description: "",
},
{
name: "word_count",
data_type: 5, //DataType.Int64
description: "",
},
],
};
await milvusClient.collectionManager.createCollection(params);
create collection -c book -f book_intro:FLOAT_VECTOR:2 -f book_id:INT64 book_id -f word_count:INT64 word_count -p book_id
- Insert data into the collection (Milvus CLI example uses a pre-built, remote CSV file containing similar data). See Insert Data for more instruction.
import random
data = [
[i for i in range(2000)],
[i for i in range(10000, 12000)],
[[random.random() for _ in range(2)] for _ in range(2000)],
]
collection.insert(data)
const data = Array.from({ length: 2000 }, (v,k) => ({
"book_intro": Array.from({ length: 2 }, () => Math.random()),
"book_id": k,
"word_count": k+10000,
}));
await milvusClient.dataManager.insert({
collection_name: "book",
fields_data: entities,
});
import -c book 'https://raw.githubusercontent.com/milvus-io/milvus_cli/main/examples/user_guide/search.csv'
- Create an index for the vector field. See Build Index for more instruction.
index_params = {
"metric_type":"L2",
"index_type":"IVF_FLAT",
"params":{"nlist":1024}
}
collection.create_index("book_intro", index_params=index_params)
const index_params = {
metric_type: "L2",
index_type: "IVF_FLAT",
params: JSON.stringify({ nlist: 1024 }),
};
await milvusClient.indexManager.createIndex({
collection_name: "book",
field_name: "book_intro",
extra_params: index_params,
});
create index
Collection name (book): book
The name of the field to create an index for (book_intro): book_intro
Index type (FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, RNSG, HNSW, ANNOY): IVF_FLAT
Index metric type (L2, IP, HAMMING, TANIMOTO): L2
Index params nlist: 1024
Timeout []:
Load collection
All CRUD operations within Milvus are executed in memory. Load the collection to memory before conducting a vector similarity search.
from pymilvus import Collection
collection = Collection("book") # Get an existing collection.
collection.load()
await milvusClient.collectionManager.loadCollection({
collection_name: "book",
});
load -c book
Prepare search parameters
Prepare the parameters that suit your search scenario. The following example defines that the search will calculate the distance with Euclidean distance, and retrieve vectors from ten closest clusters built by the IVF_FLAT index.
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
const searchParams = {
anns_field: "book_intro",
topk: "10",
metric_type: "L2",
params: JSON.stringify({ nprobe: 10 }),
};
search
Collection name (book): book
The vectors of search data(the length of data is number of query (nq), the dim of every vector in data must be equal to vector field’s of collection. You can also import a csv file without headers): [[0.1, 0.2]]
The vector field used to search of collection (book_intro): book_intro
Metric type: L2
Search parameter nprobe's value: 10
The max number of returned record, also known as topk: 10
The boolean expression used to filter attribute []:
The names of partitions to search (split by "," if multiple) ['_default'] []:
timeout []:
Guarantee Timestamp(It instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is provided, then Milvus will search all operations performed to date) [0]:
Travel Timestamp(Specify a timestamp in a search to get results based on a data view) [0]:
Parameter | Description |
---|---|
metric_type |
Metrics used to measure similarity of vectors. See Simlarity Metrics for more information. |
params |
Search parameter(s) specific to the index. See Index Selection for more information. |
Parameter | Description |
---|---|
anns_field |
Name of the field to search on. |
topk |
Number of the most similar results to return. |
metric_type |
Metrics used to measure similarity of vectors. See Simlarity Metrics for more information. |
params |
Search parameter(s) specific to the index. See Index Selection for more information. |
Option | Full name | Description |
---|---|---|
--help | n/a | Displays help for using the command. |
Conduct a vector search
Search vectors with Milvus. To search in a specific partition, specify the list of partition names.
results = collection.search(data=[[0.1, 0.2]], anns_field="book_intro", param=search_params, limit=10, expr=None)
const results = await milvusClient.dataManager.search({
collection_name: "book",
expr: "",
vectors: [[0.1, 0.2]],
search_params: searchParams,
vector_type: 101, // DataType.FloatVector
});
Parameter | Description |
---|---|
data |
Vectors to search with. |
anns_field |
Name of the field to search on. |
params |
Search parameter(s) specific to the index. See Index Selection for more information. |
limit |
Number of the most similar results to return. |
expr |
Boolean expression used to filter attribute. See Boolean Expression Rules for more information. |
partition_names (optional) |
List of names of the partition to search in. |
output_fields (optional) |
Name of the field to return. Vector field is not supported in current release. |
timeout (optional) |
A duration of time in seconds to allow for RPC. Clients wait until server responds or error occurs when it is set to None. |
round_decimal (optional) |
Number of decimal places of returned distance. |
Parameter | Description |
---|---|
collection_name |
Name of the collection to search in. |
search_params |
Parameters (as an object) used for search. |
vectors |
Vectors to search with. |
vector_type |
Pre-check of binary or float vectors. 100 for binary vectors and 101 for float vectors. |
partition_names (optional) |
List of names of the partition to search in. |
expr (optional) |
Boolean expression used to filter attribute. See Boolean Expression Rules for more information. |
output_fields (optional) |
Name of the field to return. Vector field is not supported in current release. |
Check the primary key values of the most similar vectors and their distances.
results[0].ids
results[0].distances
console.log(results.results)
# Milvus CLI automatically returns the primary key values of the most similar vectors and their distances.
Release the collection loaded in Milvus to reduce memory consumption when the search is completed.
collection.release()
await milvusClient.collectionManager.releaseCollection({ collection_name: "book",});
release -c book
Limits
Feature | Maximum limit |
---|---|
Length of a collection name | 255 characters |
Number of partitions in a collection | 4,096 |
Number of fields in a collection | 256 |
Number of shards in a collection | 256 |
Dimensions of a vector | 32,768 |
Top K | 16,384 |
Target input vectors | 16,384 |
What's next
- Learn more basic operations of Milvus:
- Explore API references for Milvus SDKs: