🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
Home
  • User Guide
  • Home
  • Docs
  • User Guide

  • Indexes

  • Floating Vector Indexes

  • HNSW_SQ

HNSW_SQ

HNSW_SQ combines Hierarchical Navigable Small World (HNSW) graphs with Scalar Quantization (SQ), creating an advanced vector indexing method that offers a controllable size-versus-accuracy trade-off. Compared to standard HNSW, this index type maintains high query processing speed while introducing a slight increase in index construction time.

Overview

HNSW_SQ combines two indexing techniques: HNSW for fast graph-based navigation and SQ for efficient vector compression.

HNSW

HNSW constructs a multi-layer graph where each node corresponds to a vector in the dataset. In this graph, nodes are connected based on their similarity, enabling rapid traversal through the data space. The hierarchical structure allows the search algorithm to narrow down the candidate neighbors, significantly accelerating the search process in high-dimensional spaces.

For more information, refer to HNSW.

SQ

SQ is a method for compressing vectors by representing them with fewer bits. For instance:

  • SQ8 uses 8 bits, mapping values into 256 levels. For more information, refer to IVF_SQ8.

  • SQ6 uses 6 bits to represent each floating-point value, resulting in 64 discrete levels.

Hnsw Sq Hnsw Sq

This reduction in precision dramatically decreases the memory footprint and speeds up the computation while retaining the essential structure of the data.

HNSW + SQ

HNSW_SQ combines the strengths of HNSW and SQ to enable efficient approximate nearest neighbor search. Here’s how the process works:

  1. Data Compression: SQ compresses the vectors using the sq_type (for example, SQ6 or SQ8), which reduces memory usage. This compression may lower precision, but it allows the system to handle larger datasets.

  2. Graph Construction: The compressed vectors are used to build an HNSW graph. Because the data is compressed, the resulting graph is smaller and faster to search.

  3. Candidate Retrieval: When a query vector is provided, the algorithm uses the compressed data to quickly identify a pool of candidate neighbors from the HNSW graph.

  4. (Optional) Result Refinement: The initial candidate results can be refined for better accuracy, based on the following parameters:

    • refine: Controls whether this refinement step is activated. When set to true, the system recalculates distances using higher-precision or uncompressed representations.

    • refine_type: Specifies the precision level of data used during refinement (e.g., SQ6, SQ8, BF16). A higher-precision choice such as FP32 can yield more accurate results but requires more memory. This must exceed the precision of the original compressed data set by sq_type.

    • refine_k: Acts as a magnification factor. For instance, if your top k is 100 and refine_k is 2, the system re-ranks the top 200 candidates and returns the best 100, enhancing overall accuracy.

For a full list of parameters and valid values, refer to Index params.

Build index

To build an HNSW_SQ index on a vector field in Milvus, use the add_index() method, specifying the index_type, metric_type, and additional parameters for the index.

from pymilvus import MilvusClient

# Prepare index building params
index_params = MilvusClient.prepare_index_params()

index_params.add_index(
    field_name="your_vector_field_name", # Name of the vector field to be indexed
    index_type="HNSW_SQ", # Type of the index to create
    index_name="vector_index", # Name of the index to create
    metric_type="L2", # Metric type used to measure similarity
    params={
        "M": 64, # Maximum number of neighbors each node can connect to in the graph
        "efConstruction": 100, # Number of candidate neighbors considered for connection during index construction
        "sq_type": "SQ6", # Scalar quantizer type
        "refine": true, # Whether to enable the refinement step
        "refine_type": "SQ8" # Precision level of data used for refinement
    } # Index building params
)

In this configuration:

  • index_type: The type of index to be built. In this example, set the value to HNSW_SQ.

  • metric_type: The method used to calculate the distance between vectors. Supported values include COSINE, L2, and IP. For details, refer to Metric Types.

  • params: Additional configuration options for building the index. For details, refer to Index building params.

Once the index parameters are configured, you can create the index by using the create_index() method directly or passing the index params in the create_collection method. For details, refer to Create Collection.

Search on index

Once the index is built and entities are inserted, you can perform similarity searches on the index.

search_params = {
    "params": {
        "ef": 10, # Parameter controlling query time/accuracy trade-off
        "refine_k": 1 # The magnification factor
    }
}

res = MilvusClient.search(
    collection_name="your_collection_name", # Collection name
    anns_field="vector_field",  # Vector field name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],  # Query vector
    limit=3,  # TopK results to return
    search_params=search_params
)

In this configuration:

Index params

This section provides an overview of the parameters used for building an index and performing searches on the index.

Index building params

The following table lists the parameters that can be configured in params when building an index.

Parameter

Description

Value Range

Tuning Suggestion

HNSW

M

Maximum number of connections (or edges) each node can have in the graph, including both outgoing and incoming edges. This parameter directly affects both index construction and search.

Type: Integer Range: [2, 2048]

Default value: 30 (up to 30 outgoing and 30 incoming edges per node)

A larger M generally leads to higher accuracy but increases memory overhead and slows down both index building and search. Consider increasing M for datasets with high dimensionality or when high recall is crucial.

Consider decreasing M when memory usage and search speed are primary concerns.

In most cases, we recommend you set a value within this range: [5, 100].

efConstruction

Number of candidate neighbors considered for connection during index construction. A larger pool of candidates is evaluated for each new element, but the maximum number of connections actually established is still limited by M.

Type: Integer Range: [1, int_max]

Default value: 360

A higher efConstruction typically results in a more accurate index, as more potential connections are explored. However, this also leads to longer indexing time and increased memory usage during construction. Consider increasing efConstruction for improved accuracy, especially in scenarios where indexing time is less critical.

Consider decreasing efConstruction to speed up index construction when resource constraints are a concern.

In most cases, we recommend you set a value within this range: [50, 500].

SQ

sq_type

Specifies the scalar quantization method for compressing vectors. Each option offers a different balance between compression and accuracy:

  • SQ6: Encodes vectors using 6-bit integers.

  • SQ8: Encodes vectors using 8-bit integers.

  • BF16: Uses the Bfloat16 format.

  • FP16: Uses the standard 16-bit floating-point format.

Type: String Range: [ SQ6, SQ8, BF16, FP16 ]

Default value: SQ8

The choice of sq_type depends on the specific application's needs. If memory efficiency is a primary concern, SQ6 or SQ8 might be suitable. On the other hand, if accuracy is paramount, BF16 or FP16 could be preferred.

refine

A boolean flag that controls whether a refinement step is applied during search. Refinement involves reranking the initial results by computing exact distances between the query vector and candidates.

Type: Boolean Range: [true, false]

Default value: false

Set to true if high accuracy is essential and you can tolerate slightly slower search times. Use false if speed is a priority and a minor compromise in accuracy is acceptable.

refine_type

Determines the precision of the data used for refinement. This precision must be higher than that of the compressed vectors (as set by sq_type), affecting both the accuracy of the re-ranked vectors and their memory footprint.

Type: String Range:[ SQ6, SQ8, BF16, FP16, FP32 ]

Default value: None

Use FP32 for maximum precision at a higher memory cost, or SQ6/SQ8 for better compression. BF16 and FP16 offer a balanced alternative.

Index-specific search params

The following table lists the parameters that can be configured in search_params.params when searching on the index.

Parameter

Description

Value Range

Tuning Suggestion

HNSW

ef

Controls the breadth of search during nearest neighbor retrieval. It determines how many nodes are visited and evaluated as potential nearest neighbors. This parameter affects only the search process and applies exclusively to the bottom layer of the graph.

Type: Integer Range: [1, int_max]

Default value: limit (TopK nearest neighbors to return)

A larger ef generally leads to higher search accuracy as more potential neighbors are considered. However, this also increases search time. Consider increasing ef when achieving high recall is critical and search speed is less of a concern.

Consider decreasing ef to prioritize faster searches, especially in scenarios where a slight reduction in accuracy is acceptable.

In most cases, we recommend you set a value within this range: [K, 10K].

SQ

refine_k

The magnification factor that controls how many extra candidates are examined during the refinement stage, relative to the requested top K results.

Type: Float Range: [1, float_max)

Default value: 1

Higher values of refine_k can improve recall and accuracy but will also increase search time and resource usage. A value of 1 means the refinement process considers only the initial top K results.

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?