Query rewriting in agentic RAG means the LLM agent reformulates the user’s original question into a better vector search query before hitting Milvus, improving retrieval precision when the original phrasing is ambiguous or underspecified.
The simplest implementation adds a single prompt step before the Milvus search call: the agent receives the user query, generates 2-3 rewritten variants that surface different aspects of the information need, and then retrieves results for all variants. Milvus’s batch query support lets you execute multiple vector searches in a single network round-trip, keeping latency manageable even with 3x the retrieval calls.
A more sophisticated pattern uses the agent’s previous context to condition the rewrite. If the agent already retrieved and read document X, it rewrites the next query to explicitly exclude X’s content domain and focus on the gap in its knowledge. This prevents the retrieval loop from repeatedly surfacing the same documents and forces exploration of the broader collection.
When implementing in Milvus, consider using separate collections with different embedding models for different document types — technical documentation, structured metadata, and conversational records often benefit from different embedding spaces. The agent can route rewritten queries to the appropriate collection based on its assessment of what type of information it needs.
Related Resources
- Agentic RAG with Milvus and LangGraph — full agentic pattern
- Enhance RAG Performance — retrieval optimization
- Milvus with LangChain — vector store integration