Increasing the number of probes (e.g., nprobe
in vector databases like FAISS) or search depth (e.g., efSearch
in HNSW graphs) directly impacts query latency by expanding the scope of the search. Higher values improve recall by examining more clusters or traversing deeper into the search space, but they also increase computational overhead and latency. For example, doubling nprobe
might require checking twice as many vector clusters, leading to longer processing times[3]. Similarly, increasing efSearch
in HNSW forces the algorithm to explore more nodes in the graph, which slows down queries but reduces the risk of missing relevant results[2].
To find an optimal balance, developers should benchmark their system with representative datasets. A practical approach involves:
nprobe
or efSearch
while tracking performance. For instance, if nprobe=10
yields 80% recall with 50ms latency, try nprobe=20
to see if recall improves to 90% at 80ms latency.Real-world applications often prioritize either speed or accuracy. For latency-sensitive systems like real-time recommendations, use lower values (e.g., nprobe=16
, efSearch=64
). For offline batch processing, higher values (e.g., nprobe=128
, efSearch=256
) may be acceptable[3][10]. Tools like grid search or Bayesian optimization can automate parameter tuning based on dataset characteristics and hardware constraints[2].
References: [2] Search Depth [3] Probes [10] Depth Research
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word