milvus-logo
Star
0
Forks
0
Get Started

Conduct a Vector Query

This topic describes how to conduct a vector query.

Unlike a vector similarity search, a vector query retrieves vectors via scalar filtering based on boolean expression. Milvus supports many data types in the scalar fields and a variety of boolean expressions. The boolean expression filters on scalar fields or the primary key field, and it retrieves all results that match the filters.

The following example shows how to perform a vector query on a 2000-row dataset of book ID (primary key), word count (scalar field), and book introduction (vector field), simulating the situation where you query for certain books based on their IDs.

Preparations

The following example code demonstrates the steps prior to a query.

If you work with your own dataset in an existing Milvus server, you can move forward to the next step.

  1. Connect to the Milvus server. See Manage Connection for more instruction.
from pymilvus import connections
connections.connect("default", host='localhost', port='19530')
const { MilvusClient } =require("@zilliz/milvus2-sdk-node");
const milvusClient = new MilvusClient("localhost:19530");
connect -h localhost -p 19530 -a default
  1. Create a collection. See Create a Collection for more instruction.
schema = CollectionSchema([
    		FieldSchema("book_id", DataType.INT64, is_primary=True),
			FieldSchema("word_count", DataType.INT64),
    		FieldSchema("book_intro", dtype=DataType.FLOAT_VECTOR, dim=2)
		])
collection = Collection("book", schema, using='default', shards_num=2)
const params = {
  collection_name: "book",
  fields: [
    {
      name: "book_intro",
      description: "",
      data_type: 101,  // DataType.FloatVector
      type_params: {
        dim: "2",
      },
    },
	{
      name: "book_id",
      data_type: 5,   //DataType.Int64
      is_primary_key: true,
      description: "",
    },
    {
      name: "word_count",
      data_type: 5,    //DataType.Int64
      description: "",
    },
  ],
};
await milvusClient.collectionManager.createCollection(params);
create collection -c book -f book_intro:FLOAT_VECTOR:2 -f book_id:INT64 book_id -f word_count:INT64 word_count -p book_id
  1. Insert data into the collection (Milvus CLI example uses a pre-built, remote CSV file containing similar data). See Insert Data for more instruction.
import random
data = [
    		[i for i in range(2000)],
			[i for i in range(10000, 12000)],
    		[[random.random() for _ in range(2)] for _ in range(2000)],
		]
collection.insert(data)
const data = Array.from({ length: 2000 }, (v,k) => ({
  "book_intro": Array.from({ length: 2 }, () => Math.random()),
  "book_id": k,
  "word_count": k+10000,
}));
await milvusClient.dataManager.insert({
  collection_name: "book",
  fields_data: entities,
});
import -c book 'https://raw.githubusercontent.com/milvus-io/milvus_cli/main/examples/user_guide/search.csv'
  1. Create an index for the vector field. See Build Index for more instruction.
index_params = {
        "metric_type":"L2",
        "index_type":"IVF_FLAT",
        "params":{"nlist":1024}
    }
collection.create_index("book_intro", index_params=index_params)
const index_params = {
  metric_type: "L2",
  index_type: "IVF_FLAT",
  params: JSON.stringify({ nlist: 1024 }),
};
await milvusClient.indexManager.createIndex({
  collection_name: "book",
  field_name: "book_intro",
  extra_params: index_params,
});
create index

Collection name (book): book

The name of the field to create an index for (book_intro): book_intro

Index type (FLAT, IVF_FLAT, IVF_SQ8, IVF_PQ, RNSG, HNSW, ANNOY): IVF_FLAT

Index metric type (L2, IP, HAMMING, TANIMOTO): L2

Index params nlist: 1024

Timeout []:

Load collection

All CRUD operations within Milvus are executed in memory. Load the collection to memory before conducting a vector query.

from pymilvus import Collection
collection = Collection("book")      # Get an existing collection.
collection.load()
await milvusClient.collectionManager.loadCollection({
  collection_name: "book",
});
load -c book
In current release, volume of the data to load must be under 70% of the total memory resources of all query nodes to reserve memory resources for execution engine.

Conduct a vector query

The following example filters the vectors with certain book_id values, and returns the book_id field and book_intro of the results.

res = collection.query(expr = "book_id in [2,4,6,8]", output_fields = ["book_id", "book_intro"])
const results = await milvusClient.dataManager.query({
  collection_name: "book",
  expr: "book_id in [2,4,6,8]",
  output_fields: ["book_id", "book_intro"],
});
query

collection_name: book

The query expression: book_id in [2,4,6,8]

Name of partitions that contain entities(split by "," if multiple) []:

A list of fields to return(split by "," if multiple) []: book_id, book_intro

timeout []:
Parameter Description
expr Boolean expression used to filter attribute. Find more expression details in Boolean Expression Rules.
output_fields (optional) List of names of the field to return.
partition_names (optional) List of names of the partitions to query on.
Parameter Description
collection_name Name of the collection to query.
expr Boolean expression used to filter attribute. Find more expression details in Boolean Expression Rules.
output_fields (optional) List of names of the field to return.
partition_names (optional) List of names of the partitions to query on.
Option Full name Description
--help n/a Displays help for using the command.

Check the returned results.

sorted_res = sorted(res, key=lambda k: k['book_id'])
sorted_res
console.log(results.data)
# Milvus CLI automatically returns the entities with the pre-defined output fields.

What's next

Is this page helpful?
On this page