How do you implement query rewriting in agentic RAG with Milvus?

Query rewriting in agentic RAG means the LLM agent reformulates the user’s original question into a better vector search query before hitting Milvus, improving retrieval precision when the original phrasing is ambiguous or underspecified.

The simplest implementation adds a single prompt step before the Milvus search call: the agent receives the user query, generates 2-3 rewritten variants that surface different aspects of the information need, and then retrieves results for all variants. Milvus’s batch query support lets you execute multiple vector searches in a single network round-trip, keeping latency manageable even with 3x the retrieval calls.

A more sophisticated pattern uses the agent’s previous context to condition the rewrite. If the agent already retrieved and read document X, it rewrites the next query to explicitly exclude X’s content domain and focus on the gap in its knowledge. This prevents the retrieval loop from repeatedly surfacing the same documents and forces exploration of the broader collection.

When implementing in Milvus, consider using separate collections with different embedding models for different document types — technical documentation, structured metadata, and conversational records often benefit from different embedding spaces. The agent can route rewritten queries to the appropriate collection based on its assessment of what type of information it needs.

Related Resources

Agentic RAG with Milvus and LangGraph — full agentic pattern
Enhance RAG Performance — retrieval optimization
Milvus with LangChain — vector store integration

How do you implement query rewriting in agentic RAG with Milvus?

Need a VectorDB for Your GenAI Apps?

Recommended Tech Blogs & Tutorials

Keep Reading

What are the key components of a speech recognition system?

How can LLMs contribute to misinformation?

How do I use Haystack to extract structured data from documents?

What is the difference between a feedforward and a recurrent neural network?