🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • What is “recall” in the context of vector search results, and how is recall typically calculated when evaluating an ANN algorithm against ground-truth neighbors?

What is “recall” in the context of vector search results, and how is recall typically calculated when evaluating an ANN algorithm against ground-truth neighbors?

What is Recall in Vector Search? In vector search, recall measures how well an approximate nearest neighbor (ANN) algorithm retrieves the true closest matches (ground-truth neighbors) for a query. Ground-truth neighbors are determined using exact search methods, which guarantee correctness but are computationally expensive. Recall answers the question: Of all the true matches that exist, how many did the ANN algorithm actually return? For example, if a query has 100 ground-truth neighbors and the ANN retrieves 80 of them, the recall is 80%. This metric is critical for evaluating trade-offs between speed and accuracy in ANN systems, where higher recall indicates fewer missed matches.

How is Recall Calculated? Recall is calculated as the ratio of true neighbors retrieved by the ANN to the total number of ground-truth neighbors. Suppose you evaluate an ANN algorithm using a dataset where each query has a known set of exact top-100 neighbors (ground-truth). If the ANN returns 90 of these 100 neighbors in its top-100 results, the recall@100 is 90/100 = 0.9 (or 90%). The formula is: Recall@k = (Number of ground-truth neighbors in ANN's top-k results) / k Here, k is the number of results returned. For instance, if you compare ANN’s top-10 results to the ground-truth top-100, recall@10 would measure how many of the top-10 ANN results are in the top-100 ground-truth (e.g., 7/100 = 7% recall@10). This flexibility allows developers to tailor evaluations to specific use cases, such as prioritizing precision at lower k or completeness at higher k.

Practical Considerations When implementing recall calculations, developers must precompute ground-truth neighbors using exact methods (e.g., brute-force search) for a representative set of queries. For large datasets, this step is time-intensive but necessary for reliable evaluation. For example, testing an ANN library like FAISS or Annoy might involve running 1,000 queries, each with 100 ground-truth neighbors, and averaging the recall across all queries. Additionally, recall often trades off with latency or memory usage: algorithms like HNSW or IVF may achieve 95% recall@100 but require more index memory or longer query times. Developers must balance these factors based on application needs—e.g., recommendation systems might prioritize high recall, while real-time apps might accept lower recall for faster results.

Like the article? Spread the word