HNSW_SQ
HNSW_SQ combines Hierarchical Navigable Small World (HNSW) graphs with Scalar Quantization (SQ), creating an advanced vector indexing method that offers a controllable size-versus-accuracy trade-off. Compared to standard HNSW, this index type maintains high query processing speed while introducing a slight increase in index construction time.
Overview
HNSW_SQ combines two indexing techniques: HNSW for fast graph-based navigation and SQ for efficient vector compression.
HNSW
HNSW constructs a multi-layer graph where each node corresponds to a vector in the dataset. In this graph, nodes are connected based on their similarity, enabling rapid traversal through the data space. The hierarchical structure allows the search algorithm to narrow down the candidate neighbors, significantly accelerating the search process in high-dimensional spaces.
For more information, refer to HNSW.
SQ
SQ is a method for compressing vectors by representing them with fewer bits. For instance:
SQ8 uses 8 bits, mapping values into 256 levels. For more information, refer to IVF_SQ8.
SQ6 uses 6 bits to represent each floating-point value, resulting in 64 discrete levels.
Hnsw Sq
This reduction in precision dramatically decreases the memory footprint and speeds up the computation while retaining the essential structure of the data.
HNSW + SQ
HNSW_SQ combines the strengths of HNSW and SQ to enable efficient approximate nearest neighbor search. Here’s how the process works:
Data Compression: SQ compresses the vectors using the
sq_type
(for example, SQ6 or SQ8), which reduces memory usage. This compression may lower precision, but it allows the system to handle larger datasets.Graph Construction: The compressed vectors are used to build an HNSW graph. Because the data is compressed, the resulting graph is smaller and faster to search.
Candidate Retrieval: When a query vector is provided, the algorithm uses the compressed data to quickly identify a pool of candidate neighbors from the HNSW graph.
(Optional) Result Refinement: The initial candidate results can be refined for better accuracy, based on the following parameters:
refine
: Controls whether this refinement step is activated. When set totrue
, the system recalculates distances using higher-precision or uncompressed representations.refine_type
: Specifies the precision level of data used during refinement (e.g., SQ6, SQ8, BF16). A higher-precision choice such asFP32
can yield more accurate results but requires more memory. This must exceed the precision of the original compressed data set bysq_type
.refine_k
: Acts as a magnification factor. For instance, if your top k is 100 andrefine_k
is 2, the system re-ranks the top 200 candidates and returns the best 100, enhancing overall accuracy.
For a full list of parameters and valid values, refer to Index params.
Build index
To build an HNSW_SQ
index on a vector field in Milvus, use the add_index()
method, specifying the index_type
, metric_type
, and additional parameters for the index.
from pymilvus import MilvusClient
# Prepare index building params
index_params = MilvusClient.prepare_index_params()
index_params.add_index(
field_name="your_vector_field_name", # Name of the vector field to be indexed
index_type="HNSW_SQ", # Type of the index to create
index_name="vector_index", # Name of the index to create
metric_type="L2", # Metric type used to measure similarity
params={
"M": 64, # Maximum number of neighbors each node can connect to in the graph
"efConstruction": 100, # Number of candidate neighbors considered for connection during index construction
"sq_type": "SQ6", # Scalar quantizer type
"refine": true, # Whether to enable the refinement step
"refine_type": "SQ8" # Precision level of data used for refinement
} # Index building params
)
In this configuration:
index_type
: The type of index to be built. In this example, set the value toHNSW_SQ
.metric_type
: The method used to calculate the distance between vectors. Supported values includeCOSINE
,L2
, andIP
. For details, refer to Metric Types.params
: Additional configuration options for building the index. For details, refer to Index building params.
Once the index parameters are configured, you can create the index by using the create_index()
method directly or passing the index params in the create_collection
method. For details, refer to Create Collection.
Search on index
Once the index is built and entities are inserted, you can perform similarity searches on the index.
search_params = {
"params": {
"ef": 10, # Parameter controlling query time/accuracy trade-off
"refine_k": 1 # The magnification factor
}
}
res = MilvusClient.search(
collection_name="your_collection_name", # Collection name
anns_field="vector_field", # Vector field name
data=[[0.1, 0.2, 0.3, 0.4, 0.5]], # Query vector
limit=3, # TopK results to return
search_params=search_params
)
In this configuration:
params
: Additional configuration options for searching on the index. For details, refer to Index-specific search params.
Index params
This section provides an overview of the parameters used for building an index and performing searches on the index.
Index building params
The following table lists the parameters that can be configured in params
when building an index.
Parameter |
Description |
Value Range |
Tuning Suggestion |
|
---|---|---|---|---|
HNSW |
|
Maximum number of connections (or edges) each node can have in the graph, including both outgoing and incoming edges. This parameter directly affects both index construction and search. |
Type: Integer Range: [2, 2048] Default value: |
A larger Consider decreasing In most cases, we recommend you set a value within this range: [5, 100]. |
|
Number of candidate neighbors considered for connection during index construction.
A larger pool of candidates is evaluated for each new element, but the maximum number of connections actually established is still limited by |
Type: Integer Range: [1, int_max] Default value: |
A higher Consider decreasing In most cases, we recommend you set a value within this range: [50, 500]. |
|
SQ |
|
Specifies the scalar quantization method for compressing vectors. Each option offers a different balance between compression and accuracy:
|
Type: String
Range: [ Default value: |
The choice of |
|
A boolean flag that controls whether a refinement step is applied during search. Refinement involves reranking the initial results by computing exact distances between the query vector and candidates. |
Type: Boolean
Range: [ Default value: |
Set to |
|
|
Determines the precision of the data used for refinement.
This precision must be higher than that of the compressed vectors (as set by |
Type: String
Range:[ Default value: None |
Use |
Index-specific search params
The following table lists the parameters that can be configured in search_params.params
when searching on the index.
Parameter |
Description |
Value Range |
Tuning Suggestion |
|
---|---|---|---|---|
HNSW |
|
Controls the breadth of search during nearest neighbor retrieval. It determines how many nodes are visited and evaluated as potential nearest neighbors. This parameter affects only the search process and applies exclusively to the bottom layer of the graph. |
Type: Integer Range: [1, int_max] Default value: limit (TopK nearest neighbors to return) |
A larger Consider decreasing In most cases, we recommend you set a value within this range: [K, 10K]. |
SQ |
|
The magnification factor that controls how many extra candidates are examined during the refinement stage, relative to the requested top K results. |
Type: Float Range: [1, float_max) Default value: 1 |
Higher values of |