Milvus
Zilliz

Are RAG systems considered high-risk under AI law?

RAG (Retrieval-Augmented Generation) systems typically fall into the “limited-risk” or “minimal-risk” category under current AI regulation, but context matters heavily. Under the EU AI Act, RAG systems used for general information retrieval (e.g., customer support chatbots) are limited-risk and require transparency disclosures. However, if your RAG system makes decisions affecting fundamental rights—hiring recommendations, credit decisions, benefits eligibility—it becomes high-risk and requires human oversight, bias testing, and detailed documentation. The distinction hinges on whether the system’s output materially affects a person’s legal rights or access to essential services.

Washington’s and Oklahoma’s new laws don’t explicitly categorize RAG systems as high-risk; they focus on chatbot behavior (self-harm detection, age verification) rather than the underlying retrieval mechanism. However, this creates ambiguity: a RAG system powering a mental health chatbot sits at the intersection—the retrieval component isn’t regulated, but the user-facing chatbot is. This means the compliance burden falls on the conversation layer, not the knowledge layer.

Practically, this means your RAG architecture needs clear separation of concerns. Use Milvus to store knowledge base embeddings separately from user interaction logs. This separation demonstrates that the retrieval layer itself is neutral—you’re simply finding relevant documents. The risk classification happens at the generation layer (the LLM responding to users), not the retrieval layer. Document this architecture explicitly: “Milvus powers semantic search; the final output is generated by a large language model subject to [regulation X].” For open-source deployments, this architectural clarity is your compliance evidence. Store metadata in Milvus indicating the source of each embedded document and the generation model that used it. When regulators ask, "How do you ensure your system doesn’t harm users?", your technical architecture tells the story: retrieval is neutral, generation is monitored.

Like the article? Spread the word