Watch a search travel through the layered graph — from the sparse top layer (highways) down to the dense bottom layer (local roads). Drag the query point on Layer 0 and tune M / ef to see how the graph changes.
M nearest neighbors. Higher layers are sparse (few nodes, long edges = "highways"). Layer 0 is dense (all nodes, short edges = "local roads").ef: at the bottom layer, instead of stopping at the first local minimum, HNSW maintains a pool of ef best candidates and expands their neighborhoods. Larger ef → more accurate but slower.Why it's fast: the top layers act like an index over the index — you cover huge distances in just a few hops, then zoom in. For N points, search visits roughly O(log N) nodes instead of N.
HNSW (Hierarchical Navigable Small World) is a graph-based approximate nearest neighbor algorithm. It organizes vectors into a stack of navigable small-world graphs: sparse upper layers with long-range edges for coarse routing, and a dense bottom layer with short edges for the final fine-grained walk. A search greedily hops toward the query on each layer, then drops down — visiting roughly O(log N) nodes instead of scanning all N.
HNSW is the most widely used in-memory vector index, and the default graph index in Milvus. Milvus also offers HNSW_SQ and HNSW_PQ variants that quantize the stored vectors to cut memory at a small recall cost.
M (build time): the maximum number of edges per node, the same knob as the slider above. Typical values are 8–64. Higher M improves recall and graph connectivity but grows memory linearly and slows construction.efConstruction (build time): the candidate pool size used while inserting nodes. Higher values build a better-quality graph (higher achievable recall) at the cost of slower index builds. Typical values are 100–500.ef / ef_search (query time): the candidate pool kept during the Layer-0 walk — the main recall/latency knob at search time. It must be at least your top-K. Increase it until recall plateaus; latency grows roughly linearly with it.HNSW typically delivers the best query latency at high recall for in-memory search — that's why it's the default choice when the dataset (plus graph overhead) fits in RAM. Its costs are memory and build time: the graph adds noticeable overhead per vector, and inserts are slower than IVF's k-means assignment. Pick IVF when you need fast index builds or aggressive quantization, and DiskANN when the dataset is too large for RAM but you still need high recall.
M caps how many edges each node keeps per layer. Larger M makes the graph better connected — higher recall and more robust routing — but memory grows linearly with M and index builds slow down. Values between 8 and 64 cover most workloads; 16–32 is a common sweet spot.
Both are candidate-pool sizes, applied at different times. efConstruction is used while building the index: it determines how thoroughly each inserted node searches for neighbors, so it controls graph quality. ef (ef_search) is used at query time: it controls how wide the Layer-0 search is, trading latency for recall on every query.
Roughly the raw vectors plus the graph: about d × 4 bytes per vector for float32 data, plus on the order of M × 2 × 8 bytes of edges per node. For memory-constrained deployments, Milvus offers HNSW_SQ and HNSW_PQ, which quantize stored vectors, or DiskANN, which moves full vectors to SSD.
Use HNSW when query latency at high recall is the priority and the index fits in memory — it usually wins that race. Prefer IVF when you rebuild indexes frequently (much faster builds), or when combined with SQ8/PQ quantization to fit a tight memory budget.
Go deeper: read the HNSW documentation for exact index params, or see Index Explained for the full decision guide.
Side-by-side comparison of brute-force vs clustered search. Tune nprobe in real time.
See how PQ-in-RAM + full-vectors-on-SSD lets you search billion-scale datasets on a laptop.
Drag a query point and switch between L2, Cosine, and IP to see how "nearest" changes.