CAGRA (a GPU-native graph-based index) and IVF-PQ deliver best performance on Blackwell, with CAGRA providing fastest index builds and sub-millisecond search latency.
CAGRA Index Performance
CAgra (Combinatorial Approximate Nearest Graph) is designed specifically for GPU execution. It builds 40x faster than CPU equivalents and delivers sub-millisecond query latency on Blackwell due to superior cache locality and tensor core utilization. For billion-scale production indexes, CAGRA is the optimal choice.
IVF-PQ Hybrid Approach
IVF-PQ (Inverted File + Product Quantization) balances search quality and memory efficiency. Blackwell’s high memory bandwidth makes IVF-PQ’s multi-stage search process extremely fast. Quantized embeddings fit in GPU cache, enabling massively parallel query processing.
HNSW GPU Execution
While HNSW originated for CPU use, Milvus supports GPU-accelerated HNSW for medium-scale datasets (10M-100M vectors). The graph traversal benefits from GPU’s parallel cores, delivering speedups versus CPU HNSW.
Index Build vs. Search Trade-offs
CAgra prioritizes index build speed (minutes for billion-element indexes). IVF-PQ optimizes for query throughput and memory efficiency. HNSW balances construction speed with search quality. Milvus operators choose based on update frequency and serving characteristics.