Same data, same query — but a different search strategy. Drag the query point, adjust nprobe, and press Run Search to see how IVF skips most of the data.
nprobe clusters: these light up. All other clusters are skipped entirely.The tradeoff: larger nprobe → more clusters scanned → higher recall but slower. Smaller nprobe → faster but you may miss true neighbors that live in unscanned clusters. Try setting nprobe = 1 and dragging the query near a cluster boundary to see recall drop.
IVF (Inverted File) is a clustering-based approximate nearest neighbor index. At build time, k-means partitions all vectors into nlist clusters, each represented by its centroid. At query time, the search first compares the query against the centroids only, picks the closest nprobe clusters, and scans just the vectors inside them — skipping everything else.
Milvus ships several IVF variants that trade memory for accuracy: IVF_FLAT stores raw vectors (most accurate), IVF_SQ8 compresses each dimension to one byte (~4× less memory, small recall cost), and IVF_PQ applies product quantization for much higher compression with a larger recall cost.
nlist (build time): the number of k-means clusters. A common starting point is 4 × sqrt(N) for N vectors. More clusters mean each one is smaller — faster scans, but the centroid sweep costs more and boundary effects grow.nprobe (query time): how many clusters to scan per query. This is the main recall/latency knob — exactly what the slider above controls. Start around nlist / 16 and increase until recall meets your target.IVF builds fast, has a small memory overhead on top of the vectors, and is easy to reason about — but at the same recall target, HNSW usually answers queries faster in memory. Choose IVF when you rebuild indexes often (fast build matters), when memory is constrained (IVF_SQ8/IVF_PQ), or when you want predictable, tunable scan behavior. Choose HNSW for the lowest in-memory query latency, and DiskANN when vectors no longer fit in RAM at all.
All three cluster vectors the same way; they differ in how vectors are stored inside clusters. IVF_FLAT keeps raw float vectors (best recall). IVF_SQ8 quantizes each dimension to 8 bits, cutting memory roughly 4× with a small recall loss. IVF_PQ replaces vectors with short product-quantization codes for the highest compression and the largest recall loss.
A common rule of thumb is nlist = 4 × sqrt(N) for N vectors, then tune nprobe at query time. Larger nprobe scans more clusters: higher recall, higher latency. Start around nlist/16 and increase until recall on a held-out query set meets your target.
Because of cluster boundaries. If the query lands near the edge of a cluster, some of its true neighbors may sit in an adjacent cluster that wasn't among the nprobe clusters scanned. You can see this in the visualization by dragging the query point near a boundary with nprobe = 1.
At the same recall, HNSW is usually faster for in-memory search but uses more memory and builds more slowly. IVF builds quickly and pairs well with quantization (SQ8/PQ) when memory is tight. For datasets that exceed RAM entirely, consider DiskANN instead.
Go deeper: read the IVF_FLAT documentation for index params, or see Index Explained for the full decision guide.
Watch the multi-layer graph search step by step — from highway jumps to local walks.
See how PQ-in-RAM + full-vectors-on-SSD lets you search billion-scale datasets on a laptop.
Drag a query point and switch between L2, Cosine, and IP to see how "nearest" changes.