Conduct a Vector Search

Milvus supports searching vectors in a collection or partition.

Search for Vectors in a Collection

  1. Create search parameters. The search parameters are stored in a JSON string, which is represented by a dictionary in the Python SDK.

    >>> search_param = {'nprobe': 16}
    Different index types requires different search parameters. You must assign values to all search parameters. See Vector Indexes for more information.
  2. Create random vectors as query_records to search:

    # Create 5 vectors of 256 dimensions.
    >>> q_records = [[random.random() for _ in range(256)] for _ in range(5)]
    >>>'test01', query_records=q_records, top_k=2, params=search_param)
    • top_k means searching the k vectors most similar to the target vector. It is defined during the search.
    • The range of top_k is [1, 16384].

Search Vectors in a Partition

# Create 5 vectors of 256 dimensions.
>>> q_records = [[random.random() for _ in range(256)] for _ in range(5)]
>>>'test01', query_records=q_records, top_k=1, partition_tags=['tag01'], params=search_param)
If you do not specify partition_tags, Milvus searches similar vectors in the entire collection.


Why is my recall rate unsatisfying? You can increase the value of nprobe when searching from a client. The greater the value, the more accurate the result, and the more time it takes. See Performance Tuning > Index for more information.
Does Milvus support inserting while searching? Yes.
Does the size of a collection affect vector searches in one of its partitions, especially when it holds up to 100 million vectors? No. If you have specified partitions when conducting a vector search, Milvus searches the specified partitions only.
Does Milvus load the whole collection to the memory if I search only certain partitions in that collection? No, Milvus only loads the partitions to search.
Are queries in segments processed in parallel?

Yes. But the parallelism processing mechanism varies with Milvus versions.

Suppose a collection has multiple segments, then when a query request comes in:

  • CPU-only Milvus processes the segment reading tasks and the segment searching tasks in pipeline.
  • On top of the abovementioned pipeline mechanism, GPU-enabled Milvus distributes the segments among the available GPUs.

See How Does Milvus Schedule Query Tasks for more information.

Will a batch query benefit from multi-threading? If your batch query is on a small scale (nq < 64), Milvus combines the query requests, in which case multi-threading helps.

Otherwise, the resources are already exhausted, hence multi-threading does not help much.

Why the search is very slow? Check if the value of cache.cache_size in server_config.yaml is greater than the size of the collection.
Why do I see a surge in memory usage when conducting a vector search immediately after an index is created? This is because:
  • Milvus loads the newly created index file to the memory for the vector search.

  • The original vector files used to create the index are not yet released from the memory, because the size of original vector files and the index file has not exceeded the upper limit specified by cache.cache_size.

Why does the first search take a long time after Milvus restarts?

This is because, after restarting, Milvus needs to load data from the disk to the memory for the first vector search. You can set preload_collection in server_config.yaml and load as many collections as the memory permits. Milvus loads collections to the memory each time it restarts.

Otherwise, you can call load_collection() to load collections to the memory.

On this page