NVIDIA cuVS (CUDA Vector Search) is a GPU-accelerated library that provides the underlying algorithms for GPU-native Milvus index types, and Blackwell’s Tensor Core architecture makes cuVS operations significantly faster than on prior GPU generations.
cuVS provides implementations of CAGRA (graph-based approximate nearest neighbor), IVF-Flat (inverted file index), and brute-force exact search, all optimized for GPU execution. Milvus’s GPU index types (GPU_CAGRA, GPU_IVF_FLAT, GPU_BRUTE_FORCE) are thin wrappers around cuVS primitives, meaning that Blackwell’s hardware improvements automatically benefit Milvus GPU indexes without any application code changes.
To use cuVS-backed indexes in Milvus, deploy Milvus on a system with a compatible CUDA runtime (CUDA 12.x is required for Blackwell) and select a GPU index type when creating your collection. The cuVS library is included in Milvus’s GPU-enabled container images — you don’t need to install or configure it separately.
The performance difference between cuVS on Blackwell versus Ampere is most pronounced for large batch operations: building a 100M-vector CAGRA index that takes 40 minutes on an A100 completes in under 5 minutes on a Blackwell B100. This matters for production systems that need to rebuild indexes frequently as the document collection is updated.
Related Resources
- Milvus Performance Benchmarks — GPU index benchmarks
- Milvus Overview — GPU support architecture
- Enhance RAG Performance — indexing strategies
- Milvus Quickstart — GPU deployment