🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • In practical benchmark reports, how are recall and QPS (queries per second) reported together to give a full picture of a vector database’s performance?

In practical benchmark reports, how are recall and QPS (queries per second) reported together to give a full picture of a vector database’s performance?

In practical benchmark reports, recall and QPS (queries per second) are reported together to balance two critical aspects of a vector database’s performance: accuracy and speed. Recall measures how effectively the database retrieves all relevant results (e.g., finding 90 out of 100 target items), while QPS quantifies how many queries the system can process per second. These metrics are inversely related—higher recall often requires more computational effort, reducing QPS, and vice versa. By reporting both, benchmarks provide a clear trade-off analysis, helping developers choose configurations that align with their application’s needs, whether prioritizing precision, speed, or a balance.

Benchmarks typically visualize this relationship using plots or tables. For example, a recall-QPS curve might show how QPS drops as recall increases from 80% to 95% under fixed hardware conditions. Alternatively, a table could list QPS values alongside corresponding recall percentages for different parameter settings, such as search limits or index types. These comparisons highlight how tuning parameters like the number of probes in an IVF index or the efSearch value in an HNSW graph impacts performance. A configuration optimized for high recall (e.g., efSearch=200) might achieve 98% recall at 200 QPS, while a speed-focused setup (efSearch=50) might reach 500 QPS with 85% recall. This data lets developers see how design choices affect real-world behavior.

The context of the application determines which metric to prioritize. For instance, a recommendation system requiring high accuracy might tolerate lower QPS (e.g., 150 QPS at 95% recall), while a real-time product search tool might prioritize 1000 QPS with 80% recall. Benchmarks often include multiple scenarios to reflect these use cases. Additionally, hardware details (CPU/GPU, memory) and dataset characteristics (vector dimensionality, size) are included to ensure fair comparisons. By presenting recall and QPS together with these variables, benchmarks offer actionable insights, enabling developers to make informed decisions about scalability, cost, and performance trade-offs.

Like the article? Spread the word