DiskANN keeps a tiny PQ-compressed copy of every vector in RAM and the full vectors + graph on SSD. Search walks the graph with cheap RAM compares, then re-ranks the top candidates with a handful of expensive disk reads.
A single commodity SSD holds the disk side easily. The RAM side fits in any laptop. Without the PQ trick, you'd need a server with hundreds of GB of RAM to even load the dataset.
PQ ≈ 1 µs, SSD ≈ 100 µs. Real numbers depend on hardware.R) optimized for sequential disk layout. No hierarchy — just one big graph.L. Pop the best, expand its neighbors, estimate distances using only the in-RAM PQ codes. No disk I/O during the walk.The mental model: PQ is the cheap "is this neighborhood promising?" test. Disk reads are the expensive "let me actually verify" check. DiskANN gets most of its work done with the cheap test.
DiskANN is a graph-based approximate nearest neighbor index designed for datasets that don't fit in RAM. It builds a single-layer Vamana graph laid out for efficient disk access, keeps a tiny product-quantized (PQ) code for every vector in memory, and stores the full-precision vectors and graph edges on SSD. Searches navigate the graph using cheap in-RAM PQ comparisons and touch the disk only to re-rank a handful of finalists.
In Milvus, DiskANN powers the on-disk index option: billion-scale collections become searchable on machines with tens — not hundreds — of gigabytes of RAM, while the disk re-rank step keeps recall close to in-memory graph indexes.
R (graph degree, build time): the maximum number of edges per node in the Vamana graph. Higher R gives better connectivity and recall, at the cost of a bigger graph on disk and slower builds.L / search_list (query time): the candidate pool maintained during the beam search — the main recall/latency knob, equivalent to HNSW's ef. It must be at least your top-K.Choose DiskANN when the dataset (or the memory bill) outgrows RAM: it serves high-recall queries with a fraction of the memory, in exchange for millisecond-level latency dominated by SSD reads. If everything fits comfortably in memory, HNSW answers faster. If you need fast index builds or moderate memory savings without involving disk, IVF variants sit in between.
When your vectors no longer fit in RAM — or the RAM to hold them costs more than you want to pay. A billion 768-dim float vectors need ~3 TB in memory for HNSW, but only tens of GB of RAM plus an SSD with DiskANN. If the dataset fits in memory comfortably, HNSW remains the lower-latency choice.
A fast NVMe SSD is the key requirement, since query latency is dominated by a handful of random disk reads (each ~100 µs). RAM needs are modest: roughly the PQ codes (e.g. 32–64 bytes per vector) plus working buffers.
Per query, yes — typically low single-digit milliseconds versus sub-millisecond for in-memory HNSW, because each disk re-rank is ~100× slower than a RAM access. The trade is cost: DiskANN serves the same dataset with an order of magnitude less RAM while keeping recall above 90–95%.
PQ codes are only used to steer the graph walk, not to produce final scores. Before returning results, DiskANN reads the full-precision vectors of the top candidates from SSD and re-ranks them exactly, correcting the quantization error that pure PQ-based indexes like IVF_PQ cannot fix.
Go deeper: read the DiskANN documentation for setup requirements, or see Index Explained for the full decision guide.