Inmemory Index
This topic lists various types of inmemory indexes Milvus supports, scenarios each of them best suits, and parameters users can configure to achieve better search performance. For ondisk indexes, see Ondisk Index.
Indexing is the process of efficiently organizing data, and it plays a major role in making similarity search useful by dramatically accelerating timeconsuming queries on large datasets.
To improve query performance, you can specify an index type for each vector field.
ANNS vector indexes
Most of the vector index types supported by Milvus use approximate nearest neighbors search (ANNS) algorithms. Compared with accurate retrieval, which is usually very timeconsuming, the core idea of ANNS is no longer limited to returning the most accurate result, but only searching for neighbors of the target. ANNS improves retrieval efficiency by sacrificing accuracy within an acceptable range.
According to the implementation methods, the ANNS vector index can be divided into four categories:
 Treebased index
 Graphbased index
 Hashbased index
 Quantizationbased index
Indexes supported in Milvus
According to the suited data type, the supported indexes in Milvus can be divided into two categories:

Indexes for floatingpoint embeddings:

For 128dimensional floatingpoint embeddings, the storage they take up is 128 * the size of float = 512 bytes. And the distance metrics used for floatpoint embeddings are Euclidean distance (L2) and Inner product.

These types of indexes include FLAT, IVF_FLAT, IVF_PQ, IVF_SQ8, HNSW, and SCANN^{(beta)} for CPUbased ANN searches and GPU_IVF_FLAT and GPU_IVF_PQ for GPUbased ANN searches.


Indexes for binary embeddings

For 128dimensional binary embeddings, the storage they take up is 128 / 8 = 16 bytes. And the distance metrics used for binary embeddings are Jaccard and Hamming.

This type of indexes include BIN_FLAT and BIN_IVF_FLAT.

The following table classifies the indexes that Milvus supports:
Supported index  Classification  Scenario 

FLAT  N/A 

IVF_FLAT  Quantizationbased index 

GPU_IVF_FLAT  Quantizationbased index 

IVF_SQ8  Quantizationbased index 

IVF_PQ  Quantizationbased index 

GPU_IVF_PQ  Quantizationbased index 

HNSW  Graphbased index 

SCANN  Quantizationbased index 

Supported index  Classification  Scenario 

BIN_FLAT  Quantizationbased index 

BIN_IVF_FLAT  Quantizationbased index 

FLAT
For vector similarity search applications that require perfect accuracy and depend on relatively small (millionscale) datasets, the FLAT index is a good choice. FLAT does not compress vectors, and is the only index that can guarantee exact search results. Results from FLAT can also be used as a point of comparison for results produced by other indexes that have less than 100% recall.
FLAT is accurate because it takes an exhaustive approach to search, which means for each query the target input is compared to every set of vectors in a dataset. This makes FLAT the slowest index on our list, and poorly suited for querying massive vector data. There are no parameters required for the FLAT index in Milvus, and using it does not need data training.

Search parameters
Parameter Description Range metric_type
[Optional] The chosen distance metric. See Supported Metrics.
IVF_FLAT
IVF_FLAT divides vector data into nlist
cluster units, and then compares distances between the target input vector and the center of each cluster. Depending on the number of clusters the system is set to query (nprobe
), similarity search results are returned based on comparisons between the target input and the vectors in the most similar cluster(s) only — drastically reducing query time.
By adjusting nprobe
, an ideal balance between accuracy and speed can be found for a given scenario. Results from the IVF_FLAT performance test demonstrate that query time increases sharply as both the number of target input vectors (nq
), and the number of clusters to search (nprobe
), increase.
IVF_FLAT is the most basic IVF index, and the encoded data stored in each unit is consistent with the original data.

Index building parameters
Parameter Description Range Default Value nlist
Number of cluster units [1, 65536] 128 
Search parameters
Parameter Description Range Default Value nprobe
Number of units to query [1, nlist] 8
GPU_IVF_FLAT
Similar to IVF_FLAT, GPU_IVF_FLAT also divides vector data into nlist
cluster units, and then compares distances between the target input vector and the center of each cluster. Depending on the number of clusters the system is set to query (nprobe
), similarity search results are returned based on comparisons between the target input and the vectors in the most similar cluster(s) only — drastically reducing query time.
By adjusting nprobe
, an ideal balance between accuracy and speed can be found for a given scenario. Results from the IVF_FLAT performance test demonstrate that query time increases sharply as both the number of target input vectors (nq
), and the number of clusters to search (nprobe
), increase.
GPU_IVF_FLAT is the most basic IVF index, and the encoded data stored in each unit is consistent with the original data.
When conducting searches, note that you can set the topK up to 256 for any search against a GPU_IVF_FLATindexed collection.

Index building parameters
Parameter Description Range Default Value nlist
Number of cluster units [1, 65536] 128 
Search parameters
Parameter Description Range Default Value nprobe
Number of units to query [1, nlist] 8 
Limits on search
Parameter Range topK
<= 256
IVF_SQ8
IVF_FLAT does not perform any compression, so the index files it produces are roughly the same size as the original, raw nonindexed vector data. For example, if the original 1B SIFT dataset is 476 GB, its IVF_FLAT index files will be slightly smaller (~470 GB). Loading all the index files into memory will consume 470 GB of storage.
When disk, CPU, or GPU memory resources are limited, IVF_SQ8 is a better option than IVF_FLAT. This index type can convert each FLOAT (4 bytes) to UINT8 (1 byte) by performing Scalar Quantization (SQ). This reduces disk, CPU, and GPU memory consumption by 70–75%. For the 1B SIFT dataset, the IVF_SQ8 index files require just 140 GB of storage.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] 
Search parameters
Parameter Description Range nprobe
Number of units to query [1, nlist]
IVF_PQ
PQ
(Product Quantization) uniformly decomposes the original highdimensional vector space into Cartesian products of m
lowdimensional vector spaces, and then quantizes the decomposed lowdimensional vector spaces. Instead of calculating the distances between the target vector and the center of all the units, product quantization enables the calculation of distances between the target vector and the clustering center of each lowdimensional space and greatly reduces the time complexity and space complexity of the algorithm.
IVF_PQ performs IVF index clustering before quantizing the product of vectors. Its index file is even smaller than IVF_SQ8, but it also causes a loss of accuracy during searching vectors.
Index building parameters and search parameters vary with Milvus distribution. Select your Milvus distribution first.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] m
Number of factors of product quantization dim mod m == 0
nbits
[Optional] Number of bits in which each lowdimensional vector is stored. [1, 16] (8 by default) 
Search parameters
Parameter Description Range nprobe
Number of units to query [1, nlist]
SCANN
SCANN (Scoreaware quantization loss) is similar to IVF_PQ in terms of vector clustering and product quantization. What makes them different lies in the implementation details of product quantization and the use of SIMD (SingleInstruction / Multidata) for efficient calculation.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] with_raw_data
Whether to include the raw data in the index True
orFalse
. Defaults toTrue
.Unlike IVF_PQ, default values apply to
m
andnbits
for optimized performance. 
Search parameters
Parameter Description Range nprobe
Number of units to query [1, nlist] reorder_k
Number of candidate units to query [ top_k
, ∞] 
Range search parameters
Parameter Description Range radius
Number of units to query [1, nlist] range_filter
Number of candidate units to query [ top_k
, ∞]
GPU_IVF_PQ
PQ
(Product Quantization) uniformly decomposes the original highdimensional vector space into Cartesian products of m
lowdimensional vector spaces, and then quantizes the decomposed lowdimensional vector spaces. Instead of calculating the distances between the target vector and the center of all the units, product quantization enables the calculation of distances between the target vector and the clustering center of each lowdimensional space and greatly reduces the time complexity and space complexity of the algorithm.
IVF_PQ performs IVF index clustering before quantizing the product of vectors. Its index file is even smaller than IVF_SQ8, but it also causes a loss of accuracy during searching vectors.
Index building parameters and search parameters vary with Milvus distribution. Select your Milvus distribution first.
When conducting searches, note that you can set the topK up to 8192 for any search against a GPU_IVF_FLATindexed collection.

Index building parameters
Parameter Description Range Default Value nlist
Number of cluster units [1, 65536] 128 m
Number of factors of product quantization dim mod m == 0
4 nbits
[Optional] Number of bits in which each lowdimensional vector is stored. [1, 16] 8 
Search parameters
Parameter Description Range Default Value nprobe
Number of units to query [1, nlist] 8 
Limits on search
Parameter Range topK
<= 8192
HNSW
HNSW (Hierarchical Navigable Small World Graph) is a graphbased indexing algorithm. It builds a multilayer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther; the lower layers are denser and the distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position.
In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to M
. In addition, you can use efConstruction
(when building index) or ef
(when searching targets) to specify a search range.

Index building parameters
Parameter Description Range M
Maximum degree of the node (1, 2048) efConstruction
Search scope (1, int32_max) 
Search parameters
Parameter Description Range ef
Search scope [ top_k
, 32768]
BIN_FLAT
This index is exactly the same as FLAT except that this can only be used for binary embeddings.
For vector similarity search applications that require perfect accuracy and depend on relatively small (millionscale) datasets, the BIN_FLAT index is a good choice. BIN_FLAT does not compress vectors, and is the only index that can guarantee exact search results. Results from BIN_FLAT can also be used as a point of comparison for results produced by other indexes that have less than 100% recall.
BIN_FLAT is accurate because it takes an exhaustive approach to search, which means for each query the target input is compared to vectors in a dataset. This makes BIN_FLAT the slowest index on our list, and poorly suited for querying massive vector data. There are no parameters for the BIN_FLAT index in Milvus, and using it does not require data training or additional storage.

Search parameters
Parameter Description Range metric_type
[Optional] The chosen distance metric. See Supported Metrics.
BIN_IVF_FLAT
This index is exactly the same as IVF_FLAT except that this can only be used for binary embeddings.
BIN_IVF_FLAT divides vector data into nlist
cluster units, and then compares distances between the target input vector and the center of each cluster. Depending on the number of clusters the system is set to query (nprobe
), similarity search results are returned based on comparisons between the target input and the vectors in the most similar cluster(s) only — drastically reducing query time.
By adjusting nprobe
, an ideal balance between accuracy and speed can be found for a given scenario. Query time increases sharply as both the number of target input vectors (nq
), and the number of clusters to search (nprobe
), increase.
BIN_IVF_FLAT is the most basic BIN_IVF index, and the encoded data stored in each unit is consistent with the original data.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] 
Search parameters
Parameter Description Range nprobe
Number of units to query [1, nlist]