On NVIDIA Blackwell GPUs, Milvus GPU-native index types — particularly GPU_CAGRA and GPU_IVF_FLAT — deliver the largest performance gains over CPU-based alternatives, while CPU indexes like HNSW continue to work but don’t benefit from Blackwell acceleration.
GPU_CAGRA is generally the highest-performing option for Blackwell deployments. It uses NVIDIA’s cuVS library to build a graph-based index directly on GPU memory, achieving both fast index construction and sub-millisecond query latency at scale. On Blackwell hardware, CAGRA benefits from the architecture’s expanded tensor throughput and 800 GB/s memory bandwidth, making it suitable for collections in the hundreds of millions of vectors range.
GPU_IVF_FLAT is the best choice when you need a simple inverted file index with full GPU acceleration. It’s faster to build than CAGRA and delivers predictable query performance, though it trades off some recall versus CAGRA at equal speed settings. For many production workloads — especially those where query patterns are predictable — GPU_IVF_FLAT on Blackwell offers an excellent performance-to-simplicity ratio.
For deployments that mix GPU and CPU resources, Milvus supports heterogeneous configurations where GPU nodes handle high-priority or latency-sensitive collections while CPU nodes handle archival or lower-traffic collections. This lets you right-size Blackwell GPU capacity for your most demanding workloads without over-provisioning across the entire collection.
Related Resources
- Milvus Performance Benchmarks — index comparison
- Milvus Overview — index types and architecture
- Enhance RAG Performance — index optimization
- Milvus Blog — GPU deployment guides