Deciding the number of retrieval rounds in a multi-step system involves balancing accuracy, efficiency, and practical constraints. The optimal depth is typically determined by evaluating how much additional value each step provides versus the cost of latency or computational resources. For example, a search system might start with a broad query, refine it based on initial results, and stop when subsequent rounds no longer yield significant new information. A common approach is to set a fixed limit (e.g., 3-5 steps) based on testing, as most queries plateau in usefulness after a few iterations. Dynamic stopping criteria, such as confidence thresholds or minimal incremental gains, can also be used. For instance, if a retrieval step adds less than 5% new relevant data, the system might halt to avoid wasting time.
Diminishing returns occur when additional rounds fail to meaningfully improve results. This can be measured by tracking metrics like precision (percentage of relevant results), recall (coverage of relevant data), and redundancy (repeated or irrelevant information). For example, in a document retrieval system, the first round might fetch 50% of relevant documents, the second 30%, and the third 10%. After three steps, the system is spending 20% more time for only 10% gain—a clear drop in efficiency. Similarly, in a chatbot answering technical questions, if a fourth round of context retrieval only clarifies minor details (e.g., edge cases affecting 1% of users), the cost of delay for most users outweighs the benefit.
To measure this, developers can use A/B testing or iterative evaluation. For instance, run experiments comparing answer quality and response time for different step limits. If adding a fourth step improves accuracy by 2% but increases latency by 200ms, the trade-off may not be justified. Tools like precision-recall curves or user feedback surveys can quantify when returns diminish. In code search systems, logging the number of unique results per step and calculating marginal gains (e.g., new code snippets per round) can reveal the “sweet spot.” For example, if steps 1-3 yield 8, 4, and 1 new useful snippets respectively, capping at three steps balances completeness and speed.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word