Back

DiskANN Explorer: Billion-Scale Vector Search From SSD

DiskANN keeps a tiny PQ-compressed copy of every vector in RAM and the full vectors + graph on SSD. Search walks the graph with cheap RAM compares, then re-ranks the top candidates with a handful of expensive disk reads.

Why DiskANN exists: the memory wall

RAMPQ codes (32 B/vec)3.2 GB
DISKFull vectors + graph (3072 B/vec)307.2 GB
100M vectors × 768 dimDISK / RAM ≈ 96×

A single commodity SSD holds the disk side easily. The RAM side fits in any laptop. Without the PQ trick, you'd need a server with hundreds of GB of RAM to even load the dataset.

R5
Graph degree
L8
Candidate pool
re-rank5
Disk reads
Speed420ms
Per step
PQ compare (RAM) in candidate pool disk re-rank top-K result
Current Step
30 / 30
Search complete. Top-3: [1, 28, 27].

Cost so far

RAM PQ compares2020 µs
SSD reads5500 µs
Total latency0.52 ms
Modeled with PQ ≈ 1 µs, SSD ≈ 100 µs. Real numbers depend on hardware.

Search Stats

Visited
20 / 36
Re-ranked
5
Top-3
[1, 28, 27]

How DiskANN works

  1. Build the Vamana graph: a single-layer kNN graph (degree R) optimized for sequential disk layout. No hierarchy — just one big graph.
  2. Quantize and split storage: every vector gets a tiny PQ code (e.g. 32 bytes). PQ codes live in RAM. Full vectors + graph edges live on SSD.
  3. Beam-walk with PQ distances: from a fixed entry point (the medoid), keep a candidate pool of size L. Pop the best, expand its neighbors, estimate distances using only the in-RAM PQ codes. No disk I/O during the walk.
  4. Disk re-rank the finalists: the candidate pool's top entries had their distances approximated by PQ. Read their full vectors from disk and recompute exact distances to fix the ranking.
  5. Return top-K. Total disk I/O = a small number of random reads, usually fewer than 20.

The trade

  • Why it scales: RAM only holds compressed codes. A 1B-vector dataset with 32-byte PQ needs ~32 GB RAM — fits on a single workstation.
  • Why it's slower than HNSW: SSD random reads are ~100× slower than RAM. Even with only ~10 reads, query latency is dominated by disk.
  • Why it still beats IVF_PQ for recall: the re-rank step recovers quantization errors that pure PQ-based methods can't fix.

The mental model: PQ is the cheap "is this neighborhood promising?" test. Disk reads are the expensive "let me actually verify" check. DiskANN gets most of its work done with the cheap test.

What is DiskANN?

DiskANN is a graph-based approximate nearest neighbor index designed for datasets that don't fit in RAM. It builds a single-layer Vamana graph laid out for efficient disk access, keeps a tiny product-quantized (PQ) code for every vector in memory, and stores the full-precision vectors and graph edges on SSD. Searches navigate the graph using cheap in-RAM PQ comparisons and touch the disk only to re-rank a handful of finalists.

In Milvus, DiskANN powers the on-disk index option: billion-scale collections become searchable on machines with tens — not hundreds — of gigabytes of RAM, while the disk re-rank step keeps recall close to in-memory graph indexes.

What the knobs control

  • R (graph degree, build time): the maximum number of edges per node in the Vamana graph. Higher R gives better connectivity and recall, at the cost of a bigger graph on disk and slower builds.
  • L / search_list (query time): the candidate pool maintained during the beam search — the main recall/latency knob, equivalent to HNSW's ef. It must be at least your top-K.
  • Re-rank budget: how many finalists get their full vectors fetched from SSD for exact scoring. Each one is a random disk read, so this is where query latency lives — and also where PQ approximation errors get corrected.

DiskANN vs HNSW vs IVF

Choose DiskANN when the dataset (or the memory bill) outgrows RAM: it serves high-recall queries with a fraction of the memory, in exchange for millisecond-level latency dominated by SSD reads. If everything fits comfortably in memory, HNSW answers faster. If you need fast index builds or moderate memory savings without involving disk, IVF variants sit in between.

Frequently asked questions

When should I use DiskANN instead of HNSW?

When your vectors no longer fit in RAM — or the RAM to hold them costs more than you want to pay. A billion 768-dim float vectors need ~3 TB in memory for HNSW, but only tens of GB of RAM plus an SSD with DiskANN. If the dataset fits in memory comfortably, HNSW remains the lower-latency choice.

What hardware does DiskANN need?

A fast NVMe SSD is the key requirement, since query latency is dominated by a handful of random disk reads (each ~100 µs). RAM needs are modest: roughly the PQ codes (e.g. 32–64 bytes per vector) plus working buffers.

Is DiskANN slower than in-memory indexes like HNSW?

Per query, yes — typically low single-digit milliseconds versus sub-millisecond for in-memory HNSW, because each disk re-rank is ~100× slower than a RAM access. The trade is cost: DiskANN serves the same dataset with an order of magnitude less RAM while keeping recall above 90–95%.

How does DiskANN keep recall high despite PQ compression?

PQ codes are only used to steer the graph walk, not to produce final scores. Before returning results, DiskANN reads the full-precision vectors of the top candidates from SSD and re-ranks them exactly, correcting the quantization error that pure PQ-based indexes like IVF_PQ cannot fix.

Go deeper: read the DiskANN documentation for setup requirements, or see Index Explained for the full decision guide.

Keep exploring