Vector index
Vector index is a timeefficient and spaceefficient data structure built on vectors through a certain mathematical model. Through the vector index, we can efficiently query several vectors similar to the target vector.
Since accurate retrieval is usually very timeconsuming, most of the vector index types of Milvus use ANNS (Approximate Nearest Neighbors Search). Compared with accurate retrieval, the core idea of ANNS is no longer limited to returning the most accurate result, but only searching for neighbors of the target. ANNS improves retrieval efficiency by sacrificing accuracy within an acceptable range.
According to the implementation methods, the ANNS vector index can be divided into four categories:
 Treebased index
 Graphbased index
 Hashbased index
 Quantizationbased index
The following table classifies the indexes that Milvus supports:
Supported index  Classification  Scenario 

FLAT  N/A 

IVF_FLAT  Quantizationbased index 

IVF_SQ8  Quantizationbased index 

IVF_SQ8H  Quantizationbased index 

IVF_PQ  Quantizationbased index  
RNSG  Graphbased index  
HNSW  Graphbased index  
Annoy  Treebased index 
Vector field and index
To improve query performance, you can specify an index type for each vector field. Currently, a vector field only supports one index type, Milvus will automatically delete the old index when switching the index type.
Create indexes
When the create_index()
method is called, Milvus synchronously indexes the existing data on this field. Whenever the size of the inserted data reaches the index_file_size
, Milvus automatically creates an index for it in the background.
Index by segment
Milvus stores massive data in sections. When indexing, Milvus creates an index for each data segment separately.
Build indexes during free time
It is known that indexing is a resourceconsuming and timeconsuming task. When the query task and indexing task are concurrent, Milvus preferentially allocates computing resources to the query task, that is, any query command will interrupt the indexing task being executed in the background. After that, only when the user does not send the query task for 5 seconds, Milvus resumes the indexing task in the background. Besides, if the data segment specified by the query command has not been built into the specified index, Milvus will do an exhaustive search directly within the segment.
Supported vector indexes
FLAT
If FLAT index is used, the vectors are stored in an array of float/binary data without any compression. during searching vectors, all indexed vectors are decoded sequentially and compared to the query vectors.
FLAT index provides 100% query recall rate. Compared to other indexes, it is the most efficient indexing method when the number of queries is small.

Search parameters
Parameter Description Range metric_type
[Optional] The chosen distance metric. See Supported Metrics.
IVF_FLAT
IVF (Inverted File) is an index type based on quantization. It divides the points in space into nlist
units by clustering method. during searching vectors, it compares the distances between the target vector and the center of all the units, and then select the nprobe
nearest unit. Then, it compares all the vectors in these selected cells to get the final result.
IVF_FLAT is the most basic IVF index, and the encoded data stored in each unit is consistent with the original data.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] 
Search parameters
Parameter Description Range nprobe
Number of units to query CPU: [1, nlist]
GPU: [1, min(2048, nlist)]
IVF_SQ8
IVF_SQ8 does scalar quantization for each vector placed in the unit based on IVF. Scalar quantization converts each dimension of the original vector from a 4byte floatingpoint number to a 1byte unsigned integer, so the IVF_SQ8 index file occupies much less space than the IVF_FLAT index file. However, scalar quantization results in a loss of accuracy during searching vectors.
 IVF_SQ8 has the same index building parameters as IVF_FLAT.
 IVF_SQ8 has the same search parameters as IVF_FLAT.
IVF_SQ8H
Optimized version of IVF_SQ8 that requires both CPU and GPU to work. Unlike IVF_SQ8, IVF_SQ8H uses a GPUbased coarse quantizer, which greatly reduces time to quantize.
IVF_SQ8H is an IVF_SQ8 index that optimizes query execution.
The query method is as follows:

If
nq
≥gpu_search_threshold
, GPU handles the entire query task. 
If
nq
<gpu_search_threshold
, GPU handles the task of retrieving thenprobe
nearest unit in the IVF index file, and CPU handles the rest. 
IVF_SQ8H has the same index building parameters as IVF_FLAT.

IVF_SQ8H has the same search parameters as IVF_FLAT.
IVF_PQ
PQ
(Product Quantization) uniformly decomposes the original highdimensional vector space into Cartesian products of m
lowdimensional vector spaces, and then quantizes the decomposed lowdimensional vector spaces. In the end, each vector is stored in m
× nbits
bits. Instead of calculating the distances between the target vector and the center of all the units, product quantization enables the calculation of distances between the target vector and the clustering center of each lowdimensional space and greatly reduces the time complexity and space complexity of the algorithm.
IVF_PQ quantizes the products of vectors before IVF index clustering. Its index file is even smaller than IVF_SQ8, but it also causes a loss of accuracy during searching vectors.

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] m
Number of factors of product quantization dim ≡ 0 (mod m) nbits
[Optional] Number of bits in which each lowdimensional vector is stored. [1, 16] (8 by default) 
Search parameters
Parameter Description Range nprobe
Number of units to query [1, nlist]

Index building parameters
Parameter Description Range nlist
Number of cluster units [1, 65536] m
Number of factors of product quantization m
∈ {1, 2, 3, 4, 8, 12, 16, 20, 24, 28, 32, 40, 48, 56, 64, 96}, and (dim / m) ∈ {1, 2, 3, 4, 6, 8, 10, 12, 16, 20, 24, 28, 32}.
(m
x 1024) ≤MaxSharedMemPerBlock
of your graphics card.nbits
[Optional] Number of bits in which each lowdimensional vector is stored. 8
 If the value of
m
does not fall into the specified range for GPU indexing but into the range of CPU indexing, Milvus switches to using CPU to build the index (click the button above to view the range supported by CPUenabled Milvus).  If the specified value of
nbits
is between 1 and 16 but not 8, the system switches back to CPUonly Milvus.

Search parameters
Parameter Description Range nprobe
Number of units to query [1, min(2048, nlist)]
nprobe
does not fall into the specified range but into the range for CPU search, Milvus switches to CPU search (click the button above to view the range supported by CPUenabled Milvus).
RNSG
RNSG (Refined Navigating Spreadingout Graph) is a graphbased indexing algorithm. It sets the center position of the whole image as a navigation point, and then uses a specific edge selection strategy to control the outdegree of each point (less than or equal to out_degree
). Therefore, it can reduce memory usage and quickly locate the target position nearby during searching vectors.
The graph construction process of RNSG is as follows:
 Find
knng
nearest neighbors for each point.  Iterate at least
search_length
times based onknng
nearest neighbor nodes to selectcandidate_pool_size
possible nearest neighbor nodes.  Construct the outedge of each point in the selected
candidate_pool_size
nodes according to the edge selection strategy.
The query process is similar to the graph building process. It starts from the navigation point and iterate at least search_length
times to get the final result.

Index building parameters
Parameter Description Range out_degree
Maximum outdegree of the node [5, 300] candidate_pool_size
Candidate pool size of the node [50, 1000] search_length
Number of query iterations [10, 300] knng
Number of nearest neighbors [5, 300] 
Search parameters
Parameter Description Range search_length
Number of query iterations [10, 300]
HNSW
HNSW (Hierarchical Small World Graph) is a graphbased indexing algorithm. It builds a multilayer navigation structure for an image according to certain rules. In this structure, the upper layers are more sparse and the distances between nodes are farther; the lower layers are denser and the distances between nodes are closer. The search starts from the uppermost layer, finds the node closest to the target in this layer, and then enters the next layer to begin another search. After multiple iterations, it can quickly approach the target position.
In order to improve performance, HNSW limits the maximum degree of nodes on each layer of the graph to M
. In addition, you can use efConstruction
(when building index) or ef
(when searching targets) to specify a search range.

Index building parameters
Parameter Description Range M
Maximum degree of the node [4, 64] efConstruction
Search scope [8, 512] 
Search parameters
Parameter Description Range ef
Search scope [ top_k
, 32768]
Annoy
Annoy (Approximate Nearest Neighbors Oh Yeah) is an index that uses a hyperplane to divide a highdimensional space into multiple subspaces, and then stores them in a tree structure.
When searching for vectors, Annoy follows the tree structure to find subspaces closer to the target vector, and then compares all the vectors in these subspaces (The number of vectors being compared should not be less than search_k
) to obtain the final result. Obviously, when the target vector is close to the edge of a certain subspace, sometimes it is necessary to greatly increase the number of searched subspaces to obtain a high recall rate. Therefore, Annoy uses n_trees
different methods to divide the whole space, and searches all the dividing methods simultaneously to reduce the probability that the target vector is always at the edge of the subspace.

Index building parameters
Parameter Description Range n_trees
The number of methods of space division. [1, 1024] 
Search parameters
Parameter Description Range search_k
The number of nodes to search. 1 means 5% of the whole data. {1} ∪ [ top_k
, n ×n_trees
]
How to choose an index
To learn how to choose an appropriate index for your application scenarios, please read How to Select an Index in Milvus.
To learn how to choose an appropriate index for a metric, see Distance Metrics.
FAQ
Does IVF_SQ8 differ from IVF_SQ8H in terms of recall rate?
No, they have the same recall rate for the same dataset.What is the difference between FLAT index and IVF_FLAT index?
IVF_FLAT index divides a vector space into nlist
clusters. If you keep the default value of nlist
as 16384, Milvus compares the distances between the target vector and the centers of all 16384 clusters to get nprobe
nearest clusters. Then Milvus compares the distances between the target vector and the vectors in the selected clusters to get the nearest vectors. Unlike IVF_FLAT, FLAT directly compares the distances between the target vector and each and every vector.
Therefore, when the total number of vectors approximately equals nlist
, IVF_FLAT and FLAT has little difference in the way of calculation required and search performance. But as the number of vectors grows to two times, three times, or n times of nlist
, IVF_FLAT index begins to show increasingly greater advantages.
See How to Choose an Index in Milvus for more information.