Open In Colab GitHub Repository

Membangun Agen RAG Sumber Ganda dengan Exa dan Milvus

Tutorial ini mendemonstrasikan cara membuat agen yang mencari di web publik (melalui Exa) dan basis pengetahuan pribadi (melalui Milvus), kemudian mensintesis jawaban terpadu. Agen menggunakan pemanggilan fungsi OpenAI untuk secara otomatis memutuskan sumber mana yang akan ditanyakan berdasarkan pertanyaan pengguna.

Exa adalah API pencarian yang dirancang untuk aplikasi AI, yang dengan bangga didukung oleh Zilliz Cloud (Milvus yang dikelola sepenuhnya). Tidak seperti mesin pencari berbasis kata kunci tradisional, Exa mendukung pencarian semantik (neural) - Anda mendeskripsikan apa yang Anda inginkan dalam bahasa alami dan Exa memahami maksud Anda. Exa juga menyediakan ekstraksi konten, sorotan, dan penyaringan berbasis kategori. Milvus adalah basis data vektor sumber terbuka yang dibuat untuk pencarian kemiripan yang dapat diskalakan. Dengan menggabungkannya dengan agen LLM, Anda dapat membangun sistem yang mengambil data kepemilikan internal dan informasi web terbaru dalam satu alur kerja.

Prasyarat

Sebelum menjalankan notebook ini, pastikan Anda telah menginstal dependensi berikut:

$ pip install exa_py pymilvus openai

Jika Anda menggunakan Google Colab, untuk mengaktifkan dependensi yang baru saja diinstal, Anda mungkin perlu memulai ulang runtime (klik menu "Runtime" di bagian atas layar, lalu pilih "Restart session" dari menu tarik-turun).

Anda akan membutuhkan kunci API dari Exa dan OpenAI. Tetapkan mereka sebagai variabel lingkungan:

import os

os.environ["EXA_API_KEY"] = "***********"
os.environ["OPENAI_API_KEY"] = "sk-***********"

Inisialisasi Klien

Siapkan klien Exa, OpenAI, dan Milvus. Kami menggunakan model text-embedding-3-small dari OpenAI untuk menghasilkan penyematan vektor, dan Milvus Lite untuk penyimpanan vektor lokal tanpa penyiapan infrastruktur.

import json
from openai import OpenAI
from pymilvus import MilvusClient, DataType
from exa_py import Exa

llm = OpenAI()
exa = Exa(api_key=os.environ["EXA_API_KEY"])
milvus = MilvusClient(uri="./milvus_exa_demo.db")

EMBED_MODEL = "text-embedding-3-small"
EMBED_DIM = 1536
COLLECTION = "private_kb"

Adapun argumen dari MilvusVectorAdapter dan MilvusClient:

  • Mengatur uri sebagai file lokal, misalnya./milvus.db, adalah metode yang paling mudah, karena secara otomatis menggunakan Milvus Lite untuk menyimpan semua data dalam file ini.
  • Jika Anda memiliki data dalam skala besar, misalnya lebih dari satu juta vektor, Anda dapat menyiapkan server Milvus yang lebih berkinerja tinggi di Docker atau Kubernetes. Dalam pengaturan ini, gunakan alamat dan port server sebagai uri Anda, misalnyahttp://localhost:19530. Jika Anda mengaktifkan fitur autentikasi pada Milvus, gunakan ":" sebagai token, jika tidak, jangan setel token.
  • Jika Anda ingin menggunakan Zilliz Cloud, layanan cloud yang dikelola sepenuhnya untuk Milvus, sesuaikan uri dan token, yang sesuai dengan Public Endpoint dan Api key di Zilliz Cloud.

Tentukan fungsi pembantu untuk menghasilkan penyematan. Kita akan menggunakan ini di seluruh buku catatan untuk pengindeksan dan kueri:

def embed_text(text: str | list[str]) -> list:
    """Generate embedding vector(s) using OpenAI."""
    resp = llm.embeddings.create(
        input=text if isinstance(text, list) else [text],
        model=EMBED_MODEL,
    )
    if isinstance(text, list):
        return [item.embedding for item in resp.data]
    return resp.data[0].embedding

Membangun Basis Pengetahuan Pribadi (Milvus)

Kami mensimulasikan satu set dokumen internal perusahaan - spesifikasi produk, kebijakan, laporan pendapatan, dan dokumen API - yang tidak akan muncul di web publik. Dalam skenario yang sebenarnya, dokumen-dokumen ini dapat berasal dari wiki, basis data, atau sistem manajemen dokumen internal Anda.

private_docs = [
    {
        "id": 1,
        "text": (
            "Acme Widget Pro supports up to 10,000 concurrent connections. "
            "It uses a proprietary compression algorithm (AcmeZip v3) that "
            "reduces payload size by 72% compared to gzip."
        ),
        "source": "product-spec.pdf",
    },
    {
        "id": 2,
        "text": (
            "Our return policy allows customers to return any product within "
            "30 days of purchase for a full refund. After 30 days, only store "
            "credit is offered. Damaged items must be reported within 48 hours."
        ),
        "source": "return-policy.md",
    },
    {
        "id": 3,
        "text": (
            "Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was "
            "primarily driven by enterprise customers adopting Widget Pro. "
            "Churn rate dropped to 3.1%."
        ),
        "source": "q3-earnings.pdf",
    },
    {
        "id": 4,
        "text": (
            "Internal API rate limits: free tier 100 req/min, pro tier "
            "5,000 req/min, enterprise tier 50,000 req/min. Rate limit "
            "headers are X-RateLimit-Remaining and X-RateLimit-Reset."
        ),
        "source": "api-docs.md",
    },
    {
        "id": 5,
        "text": (
            "Employee onboarding checklist: 1) Sign NDA, 2) Set up VPN access, "
            "3) Enroll in mandatory security training, 4) Request Jira and "
            "Confluence access from IT, 5) Schedule 1:1 with manager."
        ),
        "source": "onboarding-guide.md",
    },
]

Buat koleksi Milvus dengan skema eksplisit, sematkan dokumen, dan sisipkan:

if milvus.has_collection(COLLECTION):
    milvus.drop_collection(COLLECTION)

schema = milvus.create_schema(auto_id=False, enable_dynamic_field=True)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=EMBED_DIM)
schema.add_field(field_name="text", datatype=DataType.VARCHAR, max_length=65535)
schema.add_field(field_name="source", datatype=DataType.VARCHAR, max_length=512)

index_params = milvus.prepare_index_params()
index_params.add_index(
    field_name="vector", index_type="AUTOINDEX", metric_type="COSINE"
)

milvus.create_collection(
    collection_name=COLLECTION,
    schema=schema,
    index_params=index_params,
    # consistency_level="Strong",
)

# Embed all documents in one batch call
embeddings = embed_text([doc["text"] for doc in private_docs])

milvus.insert(
    collection_name=COLLECTION,
    data=[
        {
            "id": doc["id"],
            "vector": emb,
            "text": doc["text"],
            "source": doc["source"],
        }
        for doc, emb in zip(private_docs, embeddings)
    ],
)

print(f"Inserted {len(private_docs)} documents into Milvus.")
Inserted 5 documents into Milvus.

Mari kita verifikasi bahwa pencarian bekerja dengan kueri uji cepat:

query = "What is the return policy?"
results = milvus.search(
    collection_name=COLLECTION,
    data=[embed_text(query)],
    limit=2,
    output_fields=["text", "source"],
)

for hit in results[0]:
    print(f"[score={hit['distance']:.3f}] ({hit['entity']['source']})")
    print(f"  {hit['entity']['text'][:120]}...")
    print()
[score=0.665] (return-policy.md)
  Our return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, on...

[score=0.119] (q3-earnings.pdf)
  Q3 2025 revenue was $4.2M, up 18% from Q2. The growth was primarily driven by enterprise customers adopting Widget Pro. ...

Menjelajahi Kemampuan Pencarian Exa

Sebelum membangun agen, mari jelajahi fitur-fitur pencarian Exa. Exa mendukung beberapa mode pencarian yang berguna untuk berbagai skenario.

Pencarian semantik dengan ekstraksi konten - Exa tidak hanya dapat mengembalikan tautan, tetapi juga teks artikel, sorotan utama, dan ringkasan yang dibuat oleh AI dalam satu permintaan:

web_results = exa.search_and_contents(
    query="latest trends in AI agents 2026",
    type="auto",
    num_results=3,
    text={"max_characters": 3000},
    highlights={"num_sentences": 3},
)

for r in web_results.results:
    print(f"[{r.title}]")
    print(f"  URL: {r.url}")
    if r.highlights:
        print(f"  Highlight: {r.highlights[0][:150]}...")
    print()
[The AI Trends Shaping 2026. A month into the new year is as good a… | by ODSC - Open Data Science | Mar, 2026 | Medium]
  URL: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49
  Highlight:  ahead. January brought Claude CoWork, Anthropic’s “AI coworker” that turns agents into desktop collaborators; OpenClaw (formerly Moltbot, formerly Cl...

[AI agent trends 2026 report]
  URL: https://cloud.google.com/resources/content/ai-agent-trends-2026
  Highlight: >. The era of simple prompts is over. We're witnessing the agent leap—where AI orchestrates complex, end-to-end workflows semi-autonomously. For enter...

[The Rise of Agentic AI: Why 2026 is the Year AI Started 'Doing']
  URL: https://www.marketdrafts.com/2026/02/rise-of-agentic-ai-2026-trends.html?m=1
  Highlight:  The era of "Generative AI" (which creates content) is being superseded by "Agentic AI" (which executes actions). We are witnessing a fundamental arch...

Pemfilteran berbasis kategori - Anda dapat membatasi hasil pada jenis konten tertentu seperti "research paper", "news", "company", atau "tweet". Ini berguna ketika Anda menginginkan sumber berkualitas tinggi dan ingin menghindari noise:

filtered_results = exa.search_and_contents(
    query="retrieval augmented generation real world applications",
    category="research paper",
    num_results=3,
    highlights={"num_sentences": 2},
)

for r in filtered_results.results:
    print(f"- {r.title}")
    print(f"  {r.url}\n")
- 10 RAG examples and use cases from real companies
  https://www.evidentlyai.com/blog/rag-examples

- Implementing Retrieval-Augmented Generation (RAG) with Real-World Constraints
  https://dev.to/dextralabs/implementing-retrieval-augmented-generation-rag-with-real-world-constraints-3ajm

- 
  https://www.arxiv.org/pdf/2502.14930

Temukan artikel serupa - dengan memberikan URL, Exa dapat menemukan artikel lain dengan konten serupa. Hal ini berguna untuk memperluas penelitian dari titik awal yang baik:

if web_results.results:
    source_url = web_results.results[0].url
    similar = exa.find_similar_and_contents(
        url=source_url,
        num_results=3,
        highlights={"num_sentences": 2},
    )
    print(f"Articles similar to: {source_url}\n")
    for r in similar.results:
        print(f"- {r.title}")
        print(f"  {r.url}\n")
Articles similar to: https://odsc.medium.com/the-ai-trends-shaping-2026-34078dad4d49

- AI Trends 2026: From Agent Demos to Production Reality
  https://opendatascience.com/the-ai-trends-shaping-2026/

- The Most Important AI Trends to Watch in 2026
  https://medium.com/the-ai-studio/the-most-important-ai-trends-to-watch-in-2026-54af64d45021

Tentukan Alat Bantu Agen

Sekarang kita mendefinisikan dua fungsi alat yang akan digunakan oleh agen. Alat KB pribadi mencari Milvus menggunakan kemiripan vektor, sedangkan alat web mencari di internet publik melalui Exa:

def search_private_kb(query: str) -> str:
    """Search the internal knowledge base using Milvus vector search."""
    results = milvus.search(
        collection_name=COLLECTION,
        data=[embed_text(query)],
        limit=3,
        output_fields=["text", "source"],
    )
    chunks = []
    for hit in results[0]:
        chunks.append(f"[{hit['entity']['source']}] {hit['entity']['text']}")
    return "\n\n".join(chunks) if chunks else "No relevant internal documents found."


def search_web(query: str) -> str:
    """Search the public web using Exa for up-to-date information."""
    results = exa.search_and_contents(
        query=query,
        type="auto",
        num_results=3,
        highlights={"num_sentences": 3},
    )
    items = []
    for r in results.results:
        highlight = r.highlights[0] if r.highlights else "No snippet available."
        items.append(f"[{r.title}]({r.url})\n{highlight}")
    return "\n\n".join(items) if items else "No web results found."


TOOL_FNS = {
    "search_private_kb": search_private_kb,
    "search_web": search_web,
}

Membangun Agen

Agen menggunakan pemanggilan fungsi OpenAI untuk memutuskan alat mana yang akan dipanggil. Ini mengikuti loop sederhana: LLM menerima kueri pengguna, memutuskan alat mana yang akan dipanggil (jika ada), mengeksekusinya, dan kemudian mensintesis jawaban akhir dari konteks yang diambil.

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_private_kb",
            "description": (
                "Search the company's internal knowledge base (product docs, "
                "policies, earnings, API docs, HR guides). Use this for any "
                "question about internal/proprietary information."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query"}
                },
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": (
                "Search the public web for up-to-date external information - "
                "news, trends, competitor analysis, open-source projects, etc. "
                "Use this when the question is about the outside world."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The search query"}
                },
                "required": ["query"],
            },
        },
    },
]

SYSTEM_PROMPT = """You are a helpful assistant with access to two search tools:

1. **search_private_kb** - searches the company's internal knowledge base.
2. **search_web** - searches the public internet via Exa.

Routing rules:
- Questions about internal products, policies, metrics, or processes: use search_private_kb.
- Questions about external trends, news, competitors, or general knowledge: use search_web.
- Questions that need both internal and external context: call BOTH tools, then synthesize.

Always cite your sources. For internal docs, mention the filename. For web results, include the URL."""


def run_agent(user_query: str) -> str:
    """Run the agent loop: LLM -> tool calls -> LLM -> final answer."""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_query},
    ]

    print(f"User: {user_query}\n")

    # First LLM call - may request tool calls
    response = llm.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=TOOLS,
    )
    msg = response.choices[0].message
    messages.append(msg)

    # If no tool calls, return directly
    if not msg.tool_calls:
        print(f"Agent (no tools used): {msg.content}")
        return msg.content

    # Execute each tool call
    for tc in msg.tool_calls:
        fn_name = tc.function.name
        fn_args = json.loads(tc.function.arguments)
        print(f"  -> Calling {fn_name}(query={fn_args['query']!r})")

        result = TOOL_FNS[fn_name](**fn_args)
        messages.append(
            {
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result,
            }
        )

    # Second LLM call - synthesize final answer
    response = llm.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=TOOLS,
    )
    answer = response.choices[0].message.content
    print(f"\nAgent:\n{answer}")
    return answer

Demo

Sekarang mari kita uji agen dengan tiga skenario yang menunjukkan perilaku perutean yang berbeda.

Skenario A: Pertanyaan internal (rute ke Milvus)

Tanyakan tentang kebijakan internal - agen harus menghubungi search_private_kb dan mengambil jawaban dari dokumen pribadi kami:

run_agent("What is the return policy for Acme products?")
User: What is the return policy for Acme products?



  -> Calling search_private_kb(query='return policy Acme products')



Agent:
The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md]).





"The Acme products return policy allows customers to return any product within 30 days of purchase for a full refund. After 30 days, only store credit is offered. It's important to note that damaged items must be reported within 48 hours of receipt ([source: return-policy.md])."

Skenario B: Pertanyaan eksternal (rute ke Exa)

Bertanya tentang tren eksternal - agen harus menghubungi search_web untuk mendapatkan informasi terkini dari internet publik:

run_agent("What are the latest AI agent frameworks trending in 2026?")
User: What are the latest AI agent frameworks trending in 2026?



  -> Calling search_web(query='latest AI agent frameworks 2026')



Agent:
In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:

1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.

2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.

3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.

4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.

These frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.

Sources:
- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)
- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)
- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)





"In 2026, several AI agent frameworks are trending, each offering unique features and capabilities that cater to various needs. Here are some of the most prominent ones:\n\n1. **LangChain and LangGraph**: These frameworks remain highly popular for building large language model (LLM)-powered applications. LangGraph, in particular, models agents as state graphs, which is useful for action-oriented workflows. LangChain continues to dominate due to its comprehensive feature set for production-grade control and orchestration.\n\n2. **LangSmith Agent Builder**: Released into general availability in 2026, this tool allows teams to create AI agents using natural language, simplifying the process of agent development.\n\n3. **Semantic Kernel and AutoGen**: These have been integrated into Azure AI Foundry, creating a unified framework. Semantic Kernel uses a plugin-based middleware pattern, enhancing existing applications with AI capabilities efficiently.\n\n4. **OpenClaw**: An open-source framework that operates locally, OpenClaw transforms your computer into an autonomous agent host, differing from cloud-based solutions by keeping data and operations localized. This framework supports a large community and includes extensive skills for customization.\n\nThese frameworks cater to various requirements, whether it's production-grade solutions, open-source options, or frameworks focused on local deployment. Each framework has its strengths, depending on the use case and the existing ecosystem it fits into.\n\nSources:\n- [Agentic AI Frameworks: The Complete Guide (2026)](https://aiagentskit.com/blog/agentic-ai-frameworks/)\n- [OpenClaw: The Open-Source AI Agent Framework That Runs Your Life Locally](https://www.clawbot.blog/blog/openclaw-the-open-source-ai-agent-framework-that-runs-your-life-locally)\n- [The Best AI Agent Frameworks for 2026](https://medium.com/data-science-collective/the-best-ai-agent-frameworks-for-2026-tier-list-b3a4362fac0d)"

Skenario C: Pertanyaan hibrida (rute ke keduanya)

Ajukan pertanyaan yang membutuhkan spesifikasi internal dan tolok ukur eksternal - agen harus menghubungi kedua alat tersebut dan membuat perbandingan:

run_agent(
    "How does our Widget Pro's throughput compare to "
    "open-source alternatives on the market?"
)
User: How does our Widget Pro's throughput compare to open-source alternatives on the market?



  -> Calling search_private_kb(query='Widget Pro throughput comparison')


  -> Calling search_web(query='open-source widget throughput comparison')



Agent:
The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:

### Widget Pro

- **Concurrent Connections**: Supports up to 10,000 concurrent connections.
- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).
- **API Rate Limits**: Offers different tiers:
  - Free tier: 100 requests/minute.
  - Pro tier: 5,000 requests/minute.
  - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).

### Open-Source Alternatives

From the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.

These alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.

In conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics.





"The throughput of our Widget Pro is quite competitive when compared to open-source alternatives on the market. Here's a detailed comparison:\n\n### Widget Pro\n\n- **Concurrent Connections**: Supports up to 10,000 concurrent connections.\n- **Compression**: Utilizes AcmeZip v3, a proprietary compression algorithm that reduces payload size by 72% compared to gzip (source: [product-spec.pdf]).\n- **API Rate Limits**: Offers different tiers:\n  - Free tier: 100 requests/minute.\n  - Pro tier: 5,000 requests/minute.\n  - Enterprise tier: 50,000 requests/minute (source: [api-docs.md]).\n\n### Open-Source Alternatives\n\nFrom the available resources, open-source widget solutions such as Chatwoot and Tiledesk are popular in handling customer engagement with a flexible and customizable approach (source: [ChatMaxima article](https://chatmaxima.com/blog/15-open-source-free-live-chat-widget-solutions-to-boost-your-customer-engagement-in-2024/)). However, specific throughput metrics such as maximum concurrent connections or API limits are generally not highlighted in open-source product descriptions unless directly benchmarked.\n\nThese alternatives often emphasize customization, control, and integration with AI-driven capabilities but do not always specify throughput in terms comparable with Widget Pro. They might be more suited for organizations looking to tailor solutions to specific needs rather than focusing solely on throughput efficiency.\n\nIn conclusion, Widget Pro appears to offer high throughput suitable for enterprises with robust API support, while open-source options offer flexibility and customization with varying degrees of performance metrics."

Pembersihan

Setelah selesai, lepaskan koleksi tersebut ke sumber daya bebas.

milvus.drop_collection(COLLECTION)

Kesimpulan

Dalam tutorial ini, kita telah membuat agen RAG sumber ganda yang menggabungkan Milvus untuk pencarian pengetahuan pribadi dengan Exa untuk pencarian web publik. Komponen kuncinya adalah:

  • Milvus menyimpan dan mengambil dokumen internal melalui pencarian kemiripan vektor, memastikan data hak milik tetap privat dan dapat dicari.
  • Exa menyediakan pencarian web semantik dengan fitur-fitur seperti pemfilteran kategori, ekstraksi konten, dan penemuan artikel serupa.
  • Pemanggilan fungsi OpenAI memungkinkan LLM secara otomatis merutekan kueri ke sumber yang tepat - atau keduanya - berdasarkan maksud pertanyaan.

Pola ini dapat diterapkan pada kasus penggunaan perusahaan di mana asisten AI membutuhkan akses ke dokumen internal rahasia dan informasi eksternal real-time.