Exa 및 Milvus를 사용하여 이중 소스 RAG 에이전트 구축하기
이 튜토리얼에서는 공개 웹 ( Exa를 통해)과 비공개 지식창고 ( Milvus를 통해)를 모두 검색한 다음 통합된 답변을 합성하는 에이전트를 구축하는 방법을 보여드립니다. 에이전트는 OpenAI의 함수 호출을 사용하여 사용자의 질문에 따라 쿼리할 소스를 자동으로 결정합니다.
Exa는 AI 애플리케이션을 위해 설계된 검색 API로, Zilliz Cloud (완전 관리형 Milvus)에 의해 자랑스럽게 구동됩니다. 기존의 키워드 기반 검색 엔진과 달리 Exa는 자연어로 원하는 것을 설명하면 그 의도를 이해하는 시맨틱(신경망) 검색을 지원합니다. 또한 콘텐츠 추출, 하이라이트, 카테고리 기반 필터링 기능도 제공합니다. Milvus는 확장 가능한 유사도 검색을 위해 구축된 오픈 소스 벡터 데이터베이스입니다. 이를 LLM 에이전트와 결합하면 단일 워크플로우에서 내부 독점 데이터와 최신 웹 정보를 모두 검색하는 시스템을 구축할 수 있습니다.
전제 조건
이 노트북을 실행하기 전에 다음 종속성이 설치되어 있는지 확인하세요:
$ pip install exa_py pymilvus openai
Google Colab을 사용하는 경우, 방금 설치한 종속성을 사용하려면 런타임을 다시 시작해야 할 수 있습니다(화면 상단의 '런타임' 메뉴를 클릭하고 드롭다운 메뉴에서 '세션 다시 시작'을 선택).
Exa 및 OpenAI의 API 키가 필요합니다. 이를 환경 변수로 설정합니다:
import os
os.environ["EXA_API_KEY"] = "***********"
os.environ["OPENAI_API_KEY"] = "sk-***********"
클라이언트 초기화
Exa, OpenAI, Milvus 클라이언트를 설정합니다. 벡터 임베딩을 생성하기 위해 OpenAI의 text-embedding-3-small 모델을 사용하고, 인프라 설정이 필요 없는 로컬 벡터 스토리지에는 Milvus Lite를 사용합니다.
import json
from openai import OpenAI
from pymilvus import MilvusClient, DataType
from exa_py import Exa
llm = OpenAI()
exa = Exa(api_key=os.environ["EXA_API_KEY"])
milvus = MilvusClient(uri="./milvus_exa_demo.db")
EMBED_MODEL = "text-embedding-3-small"
EMBED_DIM = 1536
COLLECTION = "private_kb"
MilvusVectorAdapter 및 MilvusClient 의 인수는 다음과 같습니다:
uri을 로컬 파일(예:./milvus.db)로 설정하는 것이 가장 편리한 방법인데, 이 파일에 모든 데이터를 저장하기 위해 Milvus Lite를 자동으로 활용하기 때문입니다.- 백만 개 이상의 벡터와 같이 대량의 데이터가 있는 경우, Docker 또는 Kubernetes에 더 성능이 좋은 Milvus 서버를 설정할 수 있습니다. 이 설정에서는 서버 주소와 포트를 URI로 사용하세요(예:
http://localhost:19530). Milvus에서 인증 기능을 활성화하는 경우 토큰으로 ": "을 사용하고, 그렇지 않으면 토큰을 설정하지 마세요. - 밀버스의 완전 관리형 클라우드 서비스인 질리즈 클라우드를 사용하려면, 질리즈 클라우드의 퍼블릭 엔드포인트와 API 키에 해당하는
uri와token를 조정합니다.
임베딩을 생성하는 헬퍼 함수를 정의합니다. 이를 노트북 전체에서 인덱싱과 쿼리 모두에 재사용할 것입니다:
def embed_text(text: str | list[str]) -> list:
"""Generate embedding vector(s) using OpenAI."""
resp = llm.embeddings.create(
input=text if isinstance(text, list) else [text],
model=EMBED_MODEL,
)
if isinstance(text, list):
return [item.embedding for item in resp.data]
return resp.data[0].embedding
비공개 지식 베이스 구축(Milvus)
제품 사양, 정책, 수익 보고서, API 문서 등 공개 웹에는 나타나지 않는 회사 내부 문서 세트를 시뮬레이션합니다. 실제 시나리오에서 이러한 문서는 내부 위키, 데이터베이스 또는 문서 관리 시스템에서 가져올 수 있습니다.
private_docs = [
{
"id": 1,
"text": (
"Acme Widget Pro supports up to 10,000 concurrent connections. "
"It uses a proprietary compression algorithm (AcmeZip v3) that "
"reduces payload size by 72% compared to gzip."
),
"source": "product-spec.pdf",
},
{
"id": 2,
"text": (
"Our return policy allows customers to return any product within "
"30 days of purchase for a full refund. After 30 days, only store "
"credit is offered. Damaged items must be reported within 48 hours."
),
"source": "return-policy.md",
},
{
"id": 3,
"text": (
"Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was "
"primarily driven by enterprise customers adopting Widget Pro. "
"Churn rate dropped to 3.1%."
),
"source": "q3-earnings.pdf",
},
{
"id": 4,
"text": (
"Internal API rate limits: free tier 100 req/min, pro tier "
"5,000 req/min, enterprise tier 50,000 req/min. Rate limit "
"headers are X-RateLimit-Remaining and X-RateLimit-Reset."
),
"source": "api-docs.md",
},
{
"id": 5,
"text": (
"Employee onboarding checklist: 1) Sign NDA, 2) Set up VPN access, "
"3) Enroll in mandatory security training, 4) Request Jira and "
"Confluence access from IT, 5) Schedule 1:1 with manager."
),
"source": "onboarding-guide.md",
},
]
명시적인 스키마로 Milvus 컬렉션을 만들고 문서를 임베드한 다음 삽입합니다:
if milvus.has_collection(COLLECTION):
milvus.drop_collection(COLLECTION)
schema = milvus.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=EMBED_DIM)
schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=65535)
schema.add_field(field_name="source", datatype=DataType.VARCHAR, max_length=512)
index_params = milvus.prepare_index_params()
index_params.add_index(
field_name="vector", index_type="AUTOINDEX", metric_type="COSINE"
)
milvus.create_collection(
collection_name=COLLECTION,
schema=schema,
index_params=index_params,
# consistency_level="Strong",
)
# Embed all documents in one batch call
embeddings = embed_text([doc["text"] for doc in private_docs])
milvus.insert(
collection_name=COLLECTION,
data=[
{
"id": doc["id"],
"vector": emb,
"text": doc["text"],
"source": doc["source"],
}
for doc, emb in zip(private_docs, embeddings)
],
)
print(f"Inserted {len(private_docs)} documents into Milvus.")
Inserted 5 documents into Milvus.
간단한 테스트 쿼리를 통해 검색이 제대로 작동하는지 확인해 보겠습니다:
query = "What is the return policy?"
results = milvus.search(
collection_name=COLLECTION,
data=[embed_text(query)],
limit=2,
output_fields=["text", "source"],
)
for hit in results[0]:
print(f"[score={hit['distance']:.3f}] ({hit['entity']['source']})")
print(f" {hit['entity']['text'][:120]}...")
print()
[score=0.665] (return-policy.md)
Our return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, on...
[score=0.119] (q3-earnings.pdf)
Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was primarily driven by enterprise customers adopting Widget Pro. ...
Exa 검색 기능 살펴보기
에이전트를 구축하기 전에 Exa의 검색 기능을 살펴봅시다. Exa는 다양한 시나리오에 유용한 여러 검색 모드를 지원합니다.
콘텐츠 추출을 통한시맨틱 검색 - Exa는 한 번의 요청으로 링크뿐만 아니라 문서 텍스트, 주요 하이라이트, AI가 생성한 요약까지 반환할 수 있습니다:
web_results = exa.search_and_contents(
query="latest trends in AI agents 2026",
type="auto",
num_results=3,
text={"max_characters": 3000},
highlights={"num_sentences": 3},
)
for r in web_results.results:
print(f"[{r.title}]")
print(f" URL: {r.url}")
if r.highlights:
print(f" Highlight: {r.highlights[0][:150]}...")
print()
[The AI Trends Shaping 2026. A month into the new year is as good a… | by ODSC - Open Data Science | Mar, 2026 | Medium]
URL: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49
Highlight: ahead. January brought Claude CoWork, Anthropic’s “AI coworker” that turns agents into desktop collaborators; OpenClaw (formerly Moltbot, formerly Cl...
[AI agent trends 2026 report]
URL: https://cloud.google.com/resources/content/ai-agent-trends-2026
Highlight: >. The era of simple prompts is over. We're witnessing the agent leap—where AI orchestrates complex, end-to-end workflows semi-autonomously. For enter...
[The Rise of Agentic AI: Why 2026 is the Year AI Started 'Doing']
URL: https://www.marketdrafts.com/2026/02/rise-of-agentic-ai-2026-trends.html?m=1
Highlight: The era of "Generative AI" (which creates content) is being superseded by "Agentic AI" (which executes actions). We are witnessing a fundamental arch...
카테고리 기반 필터링 - "research paper", "news", "company", "tweet" 과 같은 특정 콘텐츠 유형으로 결과를 제한할 수 있습니다. 이는 고품질 소스를 원하고 노이즈를 피하고 싶을 때 유용합니다:
filtered_results = exa.search_and_contents(
query="retrieval augmented generation real world applications",
category="research paper",
num_results=3,
highlights={"num_sentences": 2},
)
for r in filtered_results.results:
print(f"- {r.title}")
print(f" {r.url}\n")
- 10 RAG examples and use cases from real companies
https://www.evidentlyai.com/blog/rag-examples
- Implementing Retrieval-Augmented Generation (RAG) with Real-World Constraints
https://dev.to/dextralabs/implementing-retrieval-augmented-generation-rag-with-real-world-constraints-3ajm
-
https://www.arxiv.org/pdf/2502.14930
유사한 기사 찾기 - URL이 주어지면 Exa는 유사한 콘텐츠를 가진 다른 기사를 찾을 수 있습니다. 이는 좋은 출발점에서 연구를 확장하는 데 유용합니다:
if web_results.results:
source_url = web_results.results[0].url
similar = exa.find_similar_and_contents(
url=source_url,
num_results=3,
highlights={"num_sentences": 2},
)
print(f"Articles similar to: {source_url}\n")
for r in similar.results:
print(f"- {r.title}")
print(f" {r.url}\n")
Articles similar to: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49
- AI Trends 2026: From Agent Demos to Production Reality
https://opendatascience.com/the-ai-trends-shaping-2026/
- The Most Important AI Trends to Watch in 2026
https://medium.com/the-ai-studio/the-most-important-ai-trends-to-watch-in-2026-54af64d45021
에이전트 도구 정의
이제 에이전트가 사용할 두 가지 도구 기능을 정의합니다. 비공개 KB 도구는 벡터 유사성을 사용하여 Milvus를 검색하고, 웹 도구는 Exa를 통해 퍼블릭 인터넷을 검색합니다:
def search_private_kb(query: str) -> str:
"""Search the internal knowledge base using Milvus vector search."""
results = milvus.search(
collection_name=COLLECTION,
data=[embed_text(query)],
limit=3,
output_fields=["text", "source"],
)
chunks = []
for hit in results[0]:
chunks.append(f"[{hit['entity']['source']}] {hit['entity']['text']}")
return "\n\n".join(chunks) if chunks else "No relevant internal documents found."
def search_web(query: str) -> str:
"""Search the public web using Exa for up-to-date information."""
results = exa.search_and_contents(
query=query,
type="auto",
num_results=3,
highlights={"num_sentences": 3},
)
items = []
for r in results.results:
highlight = r.highlights[0] if r.highlights else "No snippet available."
items.append(f"[{r.title}]({r.url})\n{highlight}")
return "\n\n".join(items) if items else "No web results found."
TOOL_FNS = {
"search_private_kb": search_private_kb,
"search_web": search_web,
}
에이전트 구축
에이전트는 OpenAI의 함수 호출을 사용하여 어떤 도구를 호출할지 결정합니다. LLM은 사용자 쿼리를 수신하고, 호출할 도구(있는 경우)를 결정한 다음, 이를 실행하고, 검색된 컨텍스트에서 최종 답변을 합성하는 간단한 루프를 따릅니다.
TOOLS = [
{
"type": "function",
"function": {
"name": "search_private_kb",
"description": (
"Search the company's internal knowledge base (product docs, "
"policies, earnings, API docs, HR guides). Use this for any "
"question about internal/proprietary information."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
},
{
"type": "function",
"function": {
"name": "search_web",
"description": (
"Search the public web for up-to-date external information - "
"news, trends, competitor analysis, open-source projects, etc. "
"Use this when the question is about the outside world."
),
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
},
},
]
SYSTEM_PROMPT = """You are a helpful assistant with access to two search tools:
1. **search_private_kb** - searches the company's internal knowledge base.
2. **search_web** - searches the public internet via Exa.
Routing rules:
- Questions about internal products, policies, metrics, or processes: use search_private_kb.
- Questions about external trends, news, competitors, or general knowledge: use search_web.
- Questions that need both internal and external context: call BOTH tools, then synthesize.
Always cite your sources. For internal docs, mention the filename. For web results, include the URL."""
def run_agent(user_query: str) -> str:
"""Run the agent loop: LLM -> tool calls -> LLM -> final answer."""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_query},
]
print(f"User: {user_query}\n")
# First LLM call - may request tool calls
response = llm.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
msg = response.choices[0].message
messages.append(msg)
# If no tool calls, return directly
if not msg.tool_calls:
print(f"Agent (no tools used): {msg.content}")
return msg.content
# Execute each tool call
for tc in msg.tool_calls:
fn_name = tc.function.name
fn_args = json.loads(tc.function.arguments)
print(f" -> Calling {fn_name}(query={fn_args['query']!r})")
result = TOOL_FNS[fn_name](**fn_args)
messages.append(
{
"role": "tool",
"tool_call_id": tc.id,
"content": result,
}
)
# Second LLM call - synthesize final answer
response = llm.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOLS,
)
answer = response.choices[0].message.content
print(f"\nAgent:\n{answer}")
return answer
데모
이제 서로 다른 라우팅 동작을 보여주는 세 가지 시나리오로 에이전트를 테스트해 보겠습니다.
시나리오 A: 내부 질문(Milvus로 라우팅)
내부 정책에 대한 질문 - 상담원은 search_private_kb 으로 전화하여 비공개 문서에서 답변을 검색해야 합니다:
run_agent("What is the return policy for Acme products?")
User: What is the return policy for Acme products?
-> Calling search_private_kb(query='return policy Acme products')
Agent:
The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md]).
"The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md])."
시나리오 B: 외부 질문(Exa로 라우팅)
외부 동향에 대한 질문 - 상담원은 search_web 으로 전화하여 공개 인터넷에서 최신 정보를 가져와야 합니다:
run_agent("What are the latest AI agent frameworks trending in 2026?")
User: What are the latest AI agent frameworks trending in 2026?
-> Calling search_web(query='latest AI agent frameworks 2026')
Agent:
In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:
1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.
2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.
3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.
4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.
These frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.
Sources:
- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)
- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)
- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)
"In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:\n\n1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.\n\n2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.\n\n3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.\n\n4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.\n\nThese frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.\n\nSources:\n- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)\n- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)\n- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)"
시나리오 C: 하이브리드 질문(두 가지 모두에 대한 경로)
내부 사양과 외부 벤치마크가 모두 필요한 질문 - 상담원이 두 도구를 모두 호출하여 비교를 종합해야 합니다:
run_agent(
"How does our Widget Pro's throughput compare to "
"open-source alternatives on the market?"
)
User: How does our Widget Pro's throughput compare to open-source alternatives on the market?
-> Calling search_private_kb(query='Widget Pro throughput comparison')
-> Calling search_web(query='open-source widget throughput comparison')
Agent:
The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:
### Widget Pro
- **Concurrent Connections**: Supports up to 10,000 concurrent connections.
- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).
- **API Rate Limits**: Offers different tiers:
- Free tier: 100 requests/minute.
- Pro tier: 5,000 requests/minute.
- Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).
### Open-Source Alternatives
From the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.
These alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.
In conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics.
"The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:\n\n### Widget Pro\n\n- **Concurrent Connections**: Supports up to 10,000 concurrent connections.\n- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).\n- **API Rate Limits**: Offers different tiers:\n - Free tier: 100 requests/minute.\n - Pro tier: 5,000 requests/minute.\n - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).\n\n### Open-Source Alternatives\n\nFrom the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.\n\nThese alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.\n\nIn conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics."
정리
완료되면 컬렉션을 무료 리소스로 내려놓습니다.
milvus.drop_collection(COLLECTION)
결론
이 튜토리얼에서는 비공개 지식 검색을 위한 Milvus와 공개 웹 검색을 위한 Exa를 결합한 이중 소스 RAG 에이전트를 구축했습니다. 핵심 구성 요소는 다음과 같습니다:
- Milvus는 벡터 유사성 검색을 통해 내부 문서를 저장하고 검색하여 독점 데이터를 비공개로 유지하고 검색할 수 있도록 합니다.
- Exa는 카테고리 필터링, 콘텐츠 추출, 유사 문서 검색과 같은 기능을 통해 시맨틱 웹 검색을 제공합니다.
- OpenAI 함수 호출을 통해 LLM은 질문의 의도에 따라 쿼리를 적절한 소스 또는 두 가지 모두로 자동 라우팅할 수 있습니다.
이 패턴은 AI 어시스턴트가 기밀 내부 문서와 실시간 외부 정보에 모두 액세스해야 하는 기업 사용 사례에 적용할 수 있습니다.