🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

How does indexing work in a vector DB (IVF, HNSW, PQ, etc.)?

How Indexing Works in a Vector Database Vector databases use indexing techniques to efficiently search high-dimensional data, balancing speed and accuracy. Common methods include Inverted File (IVF), Hierarchical Navigable Small World (HNSW), and Product Quantization (PQ). These approaches organize vectors into structures that reduce the number of comparisons needed during a query, avoiding brute-force searches across the entire dataset. Each method optimizes for specific trade-offs, such as query speed, memory usage, or accuracy.

Inverted File (IVF) IVF partitions vectors into clusters using algorithms like k-means. Each cluster has a centroid, and during indexing, vectors are grouped by their nearest centroid. When a query occurs, the system identifies the closest centroids and searches only within those clusters. For example, if a dataset has 1 million vectors divided into 1,000 clusters, a query might check 10 clusters (1% of the data), drastically reducing computation. IVF works well for large datasets but requires pre-training to define clusters. A downside is that accuracy depends on cluster quality—poorly defined clusters may miss relevant vectors. Libraries like FAISS use IVF with additional optimizations, such as combining it with PQ for compressed storage.

Hierarchical Navigable Small World (HNSW) HNSW builds a layered graph where each layer is a subset of the previous one, with the top layer being sparse and lower layers denser. During a query, the search starts at the top layer, navigating to nearby nodes, then refines the path in lower layers. This hierarchy mimics skipping lists in databases, enabling fast approximate searches. For example, in a 10-layer HNSW index, a query might traverse 5 nodes in the top layer, then 20 in the middle layers, and finally 50 in the bottom layer—far fewer than checking all vectors. HNSW excels in query speed and recall rates but uses more memory due to graph storage. It’s widely used in libraries like Annoy and is ideal for applications needing low-latency responses, such as recommendation systems.

Product Quantization (PQ) PQ compresses vectors into smaller codes to reduce memory usage and speed up distance calculations. It splits a vector into subvectors (e.g., dividing a 128-dimensional vector into 8 subvectors of 16 dimensions each). Each subvector is mapped to a “codebook” of centroids, and the original vector is represented by a sequence of centroid IDs. During a query, distances are approximated using precomputed lookup tables for subvector-centroid pairs. For example, a 128D vector compressed into 8 bytes (one per subvector) reduces memory by 16x compared to storing raw floats. PQ is often combined with IVF (IVF-PQ) to first narrow the search space and then use compressed vectors for comparisons. This hybrid approach is common in large-scale systems like image retrieval, where storage and speed are critical. However, PQ sacrifices some accuracy due to compression, making it less suitable for high-precision tasks.

Like the article? Spread the word