What Milvus indexes perform best on Blackwell GPUs?

CAGRA (a GPU-native graph-based index) and IVF-PQ deliver best performance on Blackwell, with CAGRA providing fastest index builds and sub-millisecond search latency.

CAGRA Index Performance

CAgra (Combinatorial Approximate Nearest Graph) is designed specifically for GPU execution. It builds 40x faster than CPU equivalents and delivers sub-millisecond query latency on Blackwell due to superior cache locality and tensor core utilization. For billion-scale production indexes, CAGRA is the optimal choice.

IVF-PQ Hybrid Approach

IVF-PQ (Inverted File + Product Quantization) balances search quality and memory efficiency. Blackwell’s high memory bandwidth makes IVF-PQ’s multi-stage search process extremely fast. Quantized embeddings fit in GPU cache, enabling massively parallel query processing.

HNSW GPU Execution

While HNSW originated for CPU use, Milvus supports GPU-accelerated HNSW for medium-scale datasets (10M-100M vectors). The graph traversal benefits from GPU’s parallel cores, delivering speedups versus CPU HNSW.

Index Build vs. Search Trade-offs

CAgra prioritizes index build speed (minutes for billion-element indexes). IVF-PQ optimizes for query throughput and memory efficiency. HNSW balances construction speed with search quality. Milvus operators choose based on update frequency and serving characteristics.

Related Resources

Like the article? Spread the word