FLAT vs IVF: How the Inverted File Index Works

Same data, same query — but a different search strategy. Drag the query point, adjust nprobe, and press Run Search to see how IVF skips most of the data.

nprobe2

Search 2 of 8 clusters

Speed20ms

Lower = faster animation

FLAT — Brute Force

scans every point

Comparisons

200 / 200

Status

✓ Done

Recall

100%

IVF — Inverted File

searches nprobe clusters

Comparisons

33 / 33

Status

✓ Done

Recall@5

100%

What's happening?

Build phase (already done): K-means partitioned the 200 points into 8 clusters. Each cluster has a centroid (the × marks).
Step 1 — evaluate centroids: IVF first computes the distance from the query to every centroid. That's only 8 comparisons, regardless of dataset size.
Step 2 — pick the closest nprobe clusters: these light up. All other clusters are skipped entirely.
Step 3 — scan points inside the chosen clusters: only ~50 of the 200 points.

The tradeoff: larger nprobe → more clusters scanned → higher recall but slower. Smaller nprobe → faster but you may miss true neighbors that live in unscanned clusters. Try setting nprobe = 1 and dragging the query near a cluster boundary to see recall drop.

What is an IVF index?

IVF (Inverted File) is a clustering-based approximate nearest neighbor index. At build time, k-means partitions all vectors into nlist clusters, each represented by its centroid. At query time, the search first compares the query against the centroids only, picks the closest nprobe clusters, and scans just the vectors inside them — skipping everything else.

Milvus ships several IVF variants that trade memory for accuracy: IVF_FLAT stores raw vectors (most accurate), IVF_SQ8 compresses each dimension to one byte (~4× less memory, small recall cost), and IVF_PQ applies product quantization for much higher compression with a larger recall cost.

Tuning nlist and nprobe

nlist (build time): the number of k-means clusters. A common starting point is 4 × sqrt(N) for N vectors. More clusters mean each one is smaller — faster scans, but the centroid sweep costs more and boundary effects grow.
nprobe (query time): how many clusters to scan per query. This is the main recall/latency knob — exactly what the slider above controls. Start around nlist / 16 and increase until recall meets your target.
Variant choice: start with IVF_FLAT. Move to IVF_SQ8 when memory is tight, and to IVF_PQ only when the dataset is too large for SQ8 — and verify recall on your own data after each step down.

IVF vs HNSW vs DiskANN

IVF builds fast, has a small memory overhead on top of the vectors, and is easy to reason about — but at the same recall target, HNSW usually answers queries faster in memory. Choose IVF when you rebuild indexes often (fast build matters), when memory is constrained (IVF_SQ8/IVF_PQ), or when you want predictable, tunable scan behavior. Choose HNSW for the lowest in-memory query latency, and DiskANN when vectors no longer fit in RAM at all.

Frequently asked questions

What is the difference between IVF_FLAT, IVF_SQ8, and IVF_PQ?

All three cluster vectors the same way; they differ in how vectors are stored inside clusters. IVF_FLAT keeps raw float vectors (best recall). IVF_SQ8 quantizes each dimension to 8 bits, cutting memory roughly 4× with a small recall loss. IVF_PQ replaces vectors with short product-quantization codes for the highest compression and the largest recall loss.

How should I choose nlist and nprobe?

A common rule of thumb is nlist = 4 × sqrt(N) for N vectors, then tune nprobe at query time. Larger nprobe scans more clusters: higher recall, higher latency. Start around nlist/16 and increase until recall on a held-out query set meets your target.

Why does IVF miss some true nearest neighbors?

Because of cluster boundaries. If the query lands near the edge of a cluster, some of its true neighbors may sit in an adjacent cluster that wasn't among the nprobe clusters scanned. You can see this in the visualization by dragging the query point near a boundary with nprobe = 1.

Should I use IVF or HNSW?

At the same recall, HNSW is usually faster for in-memory search but uses more memory and builds more slowly. IVF builds quickly and pairs well with quantization (SQ8/PQ) when memory is tight. For datasets that exceed RAM entirely, consider DiskANN instead.

Go deeper: read the IVF_FLAT documentation for index params, or see Index Explained for the full decision guide.