A Hierarchical Navigable Small World (HNSW) graph index is a data structure designed to efficiently perform approximate nearest neighbor (ANN) searches in high-dimensional vector spaces. It organizes vectors into a multi-layered graph where each layer represents a simplified version of the data, with higher layers containing fewer nodes and sparser connections. The key idea is to enable fast traversal by starting searches at the top (coarsest) layer and progressively refining results as the search moves down to lower, denser layers. This hierarchical approach reduces the number of comparisons needed compared to a flat graph structure, balancing speed and accuracy.
HNSW builds on the “small world” property, where most nodes can be reached from any other node in a small number of steps. Each layer in the hierarchy is a graph where nodes represent vectors, and edges connect similar vectors. Higher layers have fewer nodes and longer-range connections, allowing the algorithm to quickly skip over large regions of the vector space. For example, in a dataset of 1 million image embeddings, the top layer might contain just 1,000 nodes, while the bottom layer includes all vectors. When inserting a new vector, it is randomly assigned to a layer (with a decreasing probability for higher layers) and connected to its nearest neighbors in that layer and all layers below. This probabilistic layering ensures efficient scaling and avoids overloading upper layers with too many nodes.
During a search, HNSW starts at the top layer, identifies the closest entry points, and uses them to “navigate” downward. At each layer, the algorithm explores neighboring nodes to refine the candidate set, gradually narrowing the focus to the most promising regions. For instance, when searching for similar text embeddings, the top layer might quickly filter out unrelated clusters (e.g., separating medical documents from movie reviews), while lower layers compare finer details. The use of beam search—maintaining a dynamic list of top candidates—ensures that the search explores enough possibilities without exhaustive checks. By leveraging hierarchy and controlled connectivity, HNSW achieves sublinear search times (e.g., O(log N) complexity) while maintaining high recall, making it practical for real-world applications like recommendation systems or semantic search engines.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word