ExaとmilvusによるデュアルソースRAGエージェントの構築
このチュートリアルでは、(Exaを介して)パブリックウェブと(Milvusを介して)プライベートナレッジベースの両方を検索し、統一された答えを合成するエージェントを構築する方法を示します。このエージェントは、OpenAIの関数呼び出しを使って、ユーザの質問に基づいて、どのソースに問い合わせるかを自動的に決定します。
ExaはAIアプリケーションのために設計された検索APIであり、Zilliz Cloud(Milvusのフルマネージド)によって提供されている。従来のキーワードベースの検索エンジンとは異なり、Exaはセマンティック(ニューラル)検索をサポートしている。また、コンテンツ抽出、ハイライト、カテゴリーベースのフィルタリングも提供する。Milvusは、スケーラブルな類似検索のために構築されたオープンソースのベクトルデータベースである。これらをLLMエージェントと組み合わせることで、単一のワークフローで、社内の独自データと最新のウェブ情報の両方を検索するシステムを構築することができます。
前提条件
このノートブックを実行する前に、以下の依存関係がインストールされていることを確認してください:
$ pip install exa_py pymilvus openai
Google Colabを使用している場合、インストールしたばかりの依存関係を有効にするには、ランタイムを再起動する必要があるかもしれません(画面上部の "Runtime "メニューをクリックし、ドロップダウンメニューから "Restart session "を選択してください)。
Exaと OpenAIのAPIキーが必要です。環境変数として設定してください:
import os
os.environ["EXA_API_KEY"] = "***********"
os.environ["OPENAI_API_KEY"] = "sk-***********"
クライアントの初期化
Exa、OpenAI、milvusのクライアントを設定する。OpenAIのtext-embedding-3-small 、Milvus Liteを使用する。
import json
from openai import OpenAI
from pymilvus import MilvusClient, DataType
from exa_py import Exa
llm = OpenAI()
exa = Exa(api_key=os.environ["EXA_API_KEY"])
milvus = MilvusClient(uri="./milvus_exa_demo.db")
EMBED_MODEL = "text-embedding-3-small"
EMBED_DIM = 1536
COLLECTION = "private_kb"
MilvusVectorAdapter とMilvusClient の引数について:
uriをローカルファイル、例えば./milvus.dbに設定するのが最も便利な方法である。- 100万ベクトルを超えるような大規模なデータをお持ちの場合は、DockerやKubernetes上に、よりパフォーマンスの高いMilvusサーバを構築することができます。このセットアップでは、サーバのアドレスとポートをURIとして使用してください(例:
http://localhost:19530)。Milvusで認証機能を有効にしている場合は、トークンに ": " を使用します。そうでない場合は、トークンを設定しないでください。 - MilvusのフルマネージドクラウドサービスであるZilliz Cloudを利用する場合は、
uriとtokenをZilliz CloudのPublic EndpointとApi keyに対応させてください。
エンベッディングを生成するヘルパー関数を定義します。これはインデックス作成とクエリのためにノートブック全体で再利用する:
def embed_text(text: str | list[str]) -> list:
"""Generate embedding vector(s) using OpenAI."""
resp = llm.embeddings.create(
input=text if isinstance(text, list) else [text],
model=EMBED_MODEL,
)
if isinstance(text, list):
return [item.embedding for item in resp.data]
return resp.data[0].embedding
プライベート知識ベース(milvus)の構築
製品仕様書、ポリシー、業績報告書、APIドキュメントなど、一般公開されない社内ドキュメントのセットをシミュレートします。実際のシナリオでは、これらは社内のWiki、データベース、文書管理システムから来るかもしれません。
private_docs = [
{
"id": 1,
"text": (
"Acme Widget Pro supports up to 10,000 concurrent connections. "
"It uses a proprietary compression algorithm (AcmeZip v3) that "
"reduces payload size by 72% compared to gzip."
),
"source": "product-spec.pdf",
},
{
"id": 2,
"text": (
"Our return policy allows customers to return any product within "
"30 days of purchase for a full refund. After 30 days, only store "
"credit is offered. Damaged items must be reported within 48 hours."
),
"source": "return-policy.md",
},
{
"id": 3,
"text": (
"Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was "
"primarily driven by enterprise customers adopting Widget Pro. "
"Churn rate dropped to 3.1%."
),
"source": "q3-earnings.pdf",
},
{
"id": 4,
"text": (
"Internal API rate limits: free tier 100 req/min, pro tier "
"5,000 req/min, enterprise tier 50,000 req/min. Rate limit "
"headers are X-RateLimit-Remaining and X-RateLimit-Reset."
),
"source": "api-docs.md",
},
{
"id": 5,
"text": (
"Employee onboarding checklist: 1) Sign NDA, 2) Set up VPN access, "
"3) Enroll in mandatory security training, 4) Request Jira and "
"Confluence access from IT, 5) Schedule 1:1 with manager."
),
"source": "onboarding-guide.md",
},
]
Milvusコレクションを明示的なスキーマで作成し、ドキュメントを埋め込み、挿入します:
if milvus.has_collection(COLLECTION):
milvus.drop_collection(COLLECTION)
schema = milvus.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=EMBED_DIM)
schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=65535)
schema.add_field(field_name="source", datatype=DataType.VARCHAR, max_length=512)
index_params = milvus.prepare_index_params()
index_params.add_index(
field_name="vector", index_type="AUTOINDEX", metric_type="COSINE"
)
milvus.create_collection(
collection_name=COLLECTION,
schema=schema,
index_params=index_params,
# consistency_level="Strong",
)
# Embed all documents in one batch call
embeddings = embed_text([doc["text"] for doc in private_docs])
milvus.insert(
collection_name=COLLECTION,
data=[
{
"id": doc["id"],
"vector": emb,
"text": doc["text"],
"source": doc["source"],
}
for doc, emb in zip(private_docs, embeddings)
],
)
print(f"Inserted {len(private_docs)} documents into Milvus.")
Inserted 5 documents into Milvus.
簡単なテストクエリで検索が機能するか確認してみましょう:
query = "What is the return policy?"
results = milvus.search(
collection_name=COLLECTION,
data=[embed_text(query)],
limit=2,
output_fields=["text", "source"],
)
for hit in results[0]:
print(f"[score={hit['distance']:.3f}] ({hit['entity']['source']})")
print(f" {hit['entity']['text'][:120]}...")
print()
[score=0.665] (return-policy.md)
Our return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, on...
[score=0.119] (q3-earnings.pdf)
Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was primarily driven by enterprise customers adopting Widget Pro. ...
Exaの検索機能を調べる
エージェントを構築する前に、Exaの検索機能を調べてみよう。Exa は異なるシナリオに役立つ複数の検索モードをサポートしています。
コンテンツ抽出によるセマンティック検索- Exa はリンクだけでなく、記事本文、主要なハイライト、AI が生成した要約を一度のリクエストで返すことができます:
web_results = exa.search_and_contents(
query="latest trends in AI agents 2026",
type="auto",
num_results=3,
text={"max_characters": 3000},
highlights={"num_sentences": 3},
)
for r in web_results.results:
print(f"[{r.title}]")
print(f" URL: {r.url}")
if r.highlights:
print(f" Highlight: {r.highlights[0][:150]}...")
print()
[The AI Trends Shaping 2026. A month into the new year is as good a… | by ODSC - Open Data Science | Mar, 2026 | Medium]
URL: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49
Highlight: ahead. January brought Claude CoWork, Anthropic’s “AI coworker” that turns agents into desktop collaborators; OpenClaw (formerly Moltbot, formerly Cl...
[AI agent trends 2026 report]
URL: https://cloud.google.com/resources/content/ai-agent-trends-2026
Highlight: >. The era of simple prompts is over. We're witnessing the agent leap—where AI orchestrates complex, end-to-end workflows semi-autonomously. For enter...
[The Rise of Agentic AI: Why 2026 is the Year AI Started 'Doing']
URL: https://www.marketdrafts.com/2026/02/rise-of-agentic-ai-2026-trends.html?m=1
Highlight: The era of "Generative AI" (which creates content) is being superseded by "Agentic AI" (which executes actions). We are witnessing a fundamental arch...
カテゴリーベースのフィルタリング- 結果を"research paper" 、"news" 、"company" 、"tweet" などの特定のコンテンツタイプに制限することができます。 これは、高品質のソースが必要で、ノイズを避けたい場合に便利です:
filtered_results = exa.search_and_contents(
query="retrieval augmented generation real world applications",
category="research paper",
num_results=3,
highlights={"num_sentences": 2},
)
for r in filtered_results.results:
print(f"- {r.title}")
print(f" {r.url}\n")
- 10 RAG examples and use cases from real companies
https://www.evidentlyai.com/blog/rag-examples
- Implementing Retrieval-Augmented Generation (RAG) with Real-World Constraints
https://dev.to/dextralabs/implementing-retrieval-augmented-generation-rag-with-real-world-constraints-3ajm
-
https://www.arxiv.org/pdf/2502.14930
類似記事の検索- URL を指定すると、類似した内容の他の記事を検索することができます。これは、良い出発点から研究を広げるのに役立ちます:
if web_results.results:
source_url = web_results.results[0].url
similar = exa.find_similar_and_contents(
url=source_url,
num_results=3,
highlights={"num_sentences": 2},
)
print(f"Articles similar to: {source_url}\n")
for r in similar.results:
print(f"- {r.title}")
print(f" {r.url}\n")
Articles similar to: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49
- AI Trends 2026: From Agent Demos to Production Reality
https://opendatascience.com/the-ai-trends-shaping-2026/
- The Most Important AI Trends to Watch in 2026
https://medium.com/the-ai-studio/the-most-important-ai-trends-to-watch-in-2026-54af64d45021
エージェントツールの定義
ここで、エージェントが使用する2つのツール機能を定義します。プライベートKBツールはベクトル類似度を用いてMilvusを検索し、ウェブツールはExaを介してパブリックインターネットを検索します:
def search_private_kb(query: str) -> str:
"""Search the internal knowledge base using Milvus vector search."""
results = milvus.search(
collection_name=COLLECTION,
data=[embed_text(query)],
limit=3,
output_fields=["text", "source"],
)
chunks = []
for hit in results[0]:
chunks.append(f"[{hit['entity']['source']}] {hit['entity']['text']}")
return "\n\n".join(chunks) if chunks else "No relevant internal documents found."
def search_web(query: str) -> str:
"""Search the public web using Exa for up-to-date information."""
results = exa.search_and_contents(
query=query,
type="auto",
num_results=3,
highlights={"num_sentences": 3},
)
items = []
for r in results.results:
highlight = r.highlights[0] if r.highlights else "No snippet available."
items.append(f"[{r.title}]({r.url})\n{highlight}")
return "\n\n".join(items) if items else "No web results found."
TOOL_FNS = {
"search_private_kb": search_private_kb,
"search_web": search_web,
}
エージェントの構築
エージェントはOpenAIの関数呼び出しを使って、どのツールを呼び出すかを決定する。LLMはユーザのクエリを受け取り、どのツールを呼び出すかを決定し(もしあれば)、実行し、そして取得したコンテキストから最終的な答えを合成する。
TOOLS = [
{
"type": "function",
"function": {
"name": "search_private_kb",
"description": (
"Search the company's internal knowledge base (product docs, "
"policies, earnings, API docs, HR guides). Use this for any "
"question about internal/proprietary information."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "search_web",
"description": (
"Search the public web for up-to-date external information - "
"news, trends, competitor analysis, open-source projects, etc. "
"Use this when the question is about the outside world."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
},
]
SYSTEM_PROMPT = """You are a helpful assistant with access to two search tools:
1. **search_private_kb** - searches the company's internal knowledge base.
2. **search_web** - searches the public internet via Exa.
Routing rules:
- Questions about internal products, policies, metrics, or processes: use search_private_kb.
- Questions about external trends, news, competitors, or general knowledge: use search_web.
- Questions that need both internal and external context: call BOTH tools, then synthesize.
Always cite your sources. For internal docs, mention the filename. For web results, include the URL."""
def run_agent(user_query: str) -> str:
"""Run the agent loop: LLM -> tool calls -> LLM -> final answer."""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_query},
]
print(f"User: {user_query}\n")
# First LLM call - may request tool calls
response = llm.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
msg = response.choices[0].message
messages.append(msg)
# If no tool calls, return directly
if not msg.tool_calls:
print(f"Agent (no tools used): {msg.content}")
return msg.content
# Execute each tool call
for tc in msg.tool_calls:
fn_name = tc.function.name
fn_args = json.loads(tc.function.arguments)
print(f" -> Calling {fn_name}(query={fn_args['query']!r})")
result = TOOL_FNS[fn_name](**fn_args)
messages.append(
{
"role": "tool",
"tool_call_id": tc.id,
"content": result,
}
)
# Second LLM call - synthesize final answer
response = llm.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
answer = response.choices[0].message.content
print(f"\nAgent:\n{answer}")
return answer
デモ
それでは、異なるルーティング動作を示す3つのシナリオでエージェントをテストしてみましょう。
シナリオA: 内部的な質問(Milvusへのルーティング)
内部ポリシーについて質問する - エージェントはsearch_private_kb を呼び出し、我々のプライベートドキュメントから答えを取得する:
run_agent("What is the return policy for Acme products?")
User: What is the return policy for Acme products?
-> Calling search_private_kb(query='return policy Acme products')
Agent:
The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md]).
"The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md])."
シナリオB: 外部からの質問(Exaへのルート)
外部のトレンドについて質問する - エージェントはsearch_web 、パブリックインターネットから最新情報を取得する:
run_agent("What are the latest AI agent frameworks trending in 2026?")
User: What are the latest AI agent frameworks trending in 2026?
-> Calling search_web(query='latest AI agent frameworks 2026')
Agent:
In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:
1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.
2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.
3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.
4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.
These frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.
Sources:
- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)
- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)
- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)
"In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:\n\n1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.\n\n2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.\n\n3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.\n\n4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.\n\nThese frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.\n\nSources:\n- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)\n- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)\n- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)"
シナリオC: ハイブリッド質問(両方へのルート)
内部スペックと外部ベンチマークの両方を必要とする質問をする - エージェントは両方のツールを呼び出し、比較を合成する:
run_agent(
"How does our Widget Pro's throughput compare to "
"open-source alternatives on the market?"
)
User: How does our Widget Pro's throughput compare to open-source alternatives on the market?
-> Calling search_private_kb(query='Widget Pro throughput comparison')
-> Calling search_web(query='open-source widget throughput comparison')
Agent:
The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:
### Widget Pro
- **Concurrent Connections**: Supports up to 10,000 concurrent connections.
- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).
- **API Rate Limits**: Offers different tiers:
- Free tier: 100 requests/minute.
- Pro tier: 5,000 requests/minute.
- Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).
### Open-Source Alternatives
From the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.
These alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.
In conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics.
"The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:\n\n### Widget Pro\n\n- **Concurrent Connections**: Supports up to 10,000 concurrent connections.\n- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).\n- **API Rate Limits**: Offers different tiers:\n - Free tier: 100 requests/minute.\n - Pro tier: 5,000 requests/minute.\n - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).\n\n### Open-Source Alternatives\n\nFrom the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.\n\nThese alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.\n\nIn conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics."
クリーンアップ
終わったら、リソースを解放するためにコレクションを削除する。
milvus.drop_collection(COLLECTION)
まとめ
このチュートリアルでは、プライベートな知識検索のためのMilvusとパブリックなウェブ検索のためのExaを組み合わせたデュアルソースのRAGエージェントを構築した。主なコンポーネントは以下の通りです:
- Milvusはベクトル類似性検索によって内部文書を保存し検索する。
- Exaは、カテゴリフィルタリング、コンテンツ抽出、類似記事発見などの機能を備えたセマンティックウェブ検索を提供する。
- OpenAIの関数呼び出しにより、LLMは質問の意図に基づいて、クエリを自動的に適切なソース(またはその両方)にルーティングすることができます。
このパターンは、AIアシスタントが機密の内部文書とリアルタイムの外部情報の両方にアクセスする必要がある企業ユースケースに適用可能です。