🚀 Prova Zilliz Cloud, la versione completamente gestita di Milvus, gratuitamente—sperimenta prestazioni 10 volte più veloci! Prova Ora>>

milvus-logo
LFAI
Casa
  • Integrazioni

Costruire RAG con Milvus e Firecrawl

Open In Colab GitHub Repository

Firecrawl consente agli sviluppatori di creare applicazioni di intelligenza artificiale con dati puliti provenienti da qualsiasi sito web. Grazie a funzionalità avanzate di scraping, crawling ed estrazione dei dati, Firecrawl semplifica il processo di conversione dei contenuti dei siti web in markdown o dati strutturati puliti per i flussi di lavoro AI a valle.

In questa esercitazione vi mostreremo come costruire una pipeline Retrieval-Augmented Generation (RAG) utilizzando Milvus e Firecrawl. La pipeline integra Firecrawl per lo scraping dei dati web, Milvus per l'archiviazione vettoriale e OpenAI per la generazione di risposte perspicue e consapevoli del contesto.

Preparazione

Dipendenze e ambiente

Per iniziare, installare le dipendenze necessarie eseguendo il seguente comando:

$ pip install firecrawl-py pymilvus openai requests tqdm

Se si utilizza Google Colab, per abilitare le dipendenze appena installate potrebbe essere necessario riavviare il runtime (fare clic sul menu "Runtime" nella parte superiore dello schermo e selezionare "Restart session" dal menu a discesa).

Impostazione delle chiavi API

Per usare Firecrawl per raschiare i dati dall'URL specificato, è necessario ottenere una FIRECRAWL_API_KEY e impostarla come variabile d'ambiente. Inoltre, in questo esempio useremo OpenAI come LLM. È necessario preparare anche OPENAI_API_KEY come variabile d'ambiente.

import os

os.environ["FIRECRAWL_API_KEY"] = "fc-***********"
os.environ["OPENAI_API_KEY"] = "sk-***********"

Preparare l'LLM e il modello di incorporamento

Inizializziamo il client OpenAI per preparare il modello di embedding.

from openai import OpenAI

openai_client = OpenAI()

Definire una funzione per generare embedding di testo utilizzando il client OpenAI. Utilizziamo il modello text-embedding-3-small come esempio.

def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")
        .data[0]
        .embedding
    )

Generare un embedding di prova e stamparne la dimensione e i primi elementi.

test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])
1536
[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]

Raschiare i dati con Firecrawl

Inizializzare l'applicazione Firecrawl

Utilizzeremo la libreria firecrawl per eseguire lo scraping dei dati dall'URL specificato in formato markdown. Iniziare inizializzando l'applicazione Firecrawl:

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key=os.environ["FIRECRAWL_API_KEY"])

Scrape il sito web di destinazione

Eseguire lo scraping del contenuto dell'URL di destinazione. Il sito web LLM-powered Autonomous Agents fornisce un'esplorazione approfondita dei sistemi di agenti autonomi costruiti utilizzando modelli linguistici di grandi dimensioni (LLM). Utilizzeremo questi contenuti per costruire un sistema RAG.

# Scrape a website:
scrape_status = app.scrape_url(
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    params={"formats": ["markdown"]},
)

markdown_content = scrape_status["markdown"]

Elaborare il contenuto di scraping

Per rendere il contenuto raschiato gestibile per l'inserimento in Milvus, usiamo semplicemente "# " per separare il contenuto, che può separare approssimativamente il contenuto di ogni parte principale del file markdown raschiato.

def split_markdown_content(content):
    return [section.strip() for section in content.split("# ") if section.strip()]


# Process the scraped markdown content
sections = split_markdown_content(markdown_content)

# Print the first few sections to understand the structure
for i, section in enumerate(sections[:3]):
    print(f"Section {i+1}:")
    print(section[:300] + "...")
    print("-" * 50)
Section 1:
Table of Contents

- [Agent System Overview](#agent-system-overview)
- [Component One: Planning](#component-one-planning)  - [Task Decomposition](#task-decomposition)
  - [Self-Reflection](#self-reflection)
- [Component Two: Memory](#component-two-memory)  - [Types of Memory](#types-of-memory)
  - [...
--------------------------------------------------
Section 2:
Agent System Overview [\#](\#agent-system-overview)

In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

- **Planning**
  - Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling effi...
--------------------------------------------------
Section 3:
Component One: Planning [\#](\#component-one-planning)

A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.

#...
--------------------------------------------------

Caricare i dati in Milvus

Creare la raccolta

from pymilvus import MilvusClient

milvus_client = MilvusClient(uri="./milvus_demo.db")
collection_name = "my_rag_collection"

Come per l'argomento di MilvusClient:

  • Impostare uri come file locale, ad esempio./milvus.db, è il metodo più conveniente, poiché utilizza automaticamente Milvus Lite per memorizzare tutti i dati in questo file.

  • Se si dispone di una grande quantità di dati, è possibile configurare un server Milvus più performante su docker o kubernetes. In questa configurazione, utilizzare l'uri del server, ad esempiohttp://localhost:19530, come uri.

  • Se si desidera utilizzare Zilliz Cloud, il servizio cloud completamente gestito per Milvus, regolare uri e token, che corrispondono all'endpoint pubblico e alla chiave Api di Zilliz Cloud.

Verificare se la raccolta esiste già e, in caso affermativo, eliminarla.

if milvus_client.has_collection(collection_name):
    milvus_client.drop_collection(collection_name)

Creare una nuova raccolta con i parametri specificati.

Se non si specifica alcun campo, Milvus creerà automaticamente un campo predefinito id per la chiave primaria e un campo vector per memorizzare i dati vettoriali. Un campo JSON riservato viene utilizzato per memorizzare campi non definiti dalla mappa e i loro valori.

milvus_client.create_collection(
    collection_name=collection_name,
    dimension=embedding_dim,
    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level
)

Inserire i dati

from tqdm import tqdm

data = []

for i, section in enumerate(tqdm(sections, desc="Processing sections")):
    embedding = emb_text(section)
    data.append({"id": i, "vector": embedding, "text": section})

# Insert data into Milvus
milvus_client.insert(collection_name=collection_name, data=data)
Processing sections: 100%|██████████| 17/17 [00:08<00:00,  2.09it/s]





{'insert_count': 17, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], 'cost': 0}

Creare una RAG

Recuperare i dati per una query

Specifichiamo una domanda sul sito web che abbiamo appena scrapato.

question = "What are the main components of autonomous agents?"

Cerchiamo la domanda nella raccolta e recuperiamo le prime tre corrispondenze semantiche.

search_res = milvus_client.search(
    collection_name=collection_name,
    data=[emb_text(question)],
    limit=3,
    search_params={"metric_type": "IP", "params": {}},
    output_fields=["text"],
)

Diamo un'occhiata ai risultati della ricerca della query

import json

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))
[
    [
        "Agent System Overview [\\#](\\#agent-system-overview)\n\nIn a LLM-powered autonomous agent system, LLM functions as the agent\u2019s brain, complemented by several key components:\n\n- **Planning**\n  - Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n  - Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n- **Memory**\n  - Short-term memory: I would consider all the in-context learning (See [Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)) as utilizing short-term memory of the model to learn.\n  - Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n- **Tool use**\n  - The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\n\n![](agent-overview.png)Fig. 1. Overview of a LLM-powered autonomous agent system.",
        0.6343474388122559
    ],
    [
        "Table of Contents\n\n- [Agent System Overview](#agent-system-overview)\n- [Component One: Planning](#component-one-planning)  - [Task Decomposition](#task-decomposition)\n  - [Self-Reflection](#self-reflection)\n- [Component Two: Memory](#component-two-memory)  - [Types of Memory](#types-of-memory)\n  - [Maximum Inner Product Search (MIPS)](#maximum-inner-product-search-mips)\n- [Component Three: Tool Use](#component-three-tool-use)\n- [Case Studies](#case-studies)  - [Scientific Discovery Agent](#scientific-discovery-agent)\n  - [Generative Agents Simulation](#generative-agents-simulation)\n  - [Proof-of-Concept Examples](#proof-of-concept-examples)\n- [Challenges](#challenges)\n- [Citation](#citation)\n- [References](#references)\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT), [GPT-Engineer](https://github.com/AntonOsika/gpt-engineer) and [BabyAGI](https://github.com/yoheinakajima/babyagi), serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.",
        0.5715497732162476
    ],
    [
        "Challenges [\\#](\\#challenges)\n\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:\n\n- **Finite context length**: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n- **Challenges in long-term planning and task decomposition**: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\n- **Reliability of natural language interface**: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.",
        0.5009307265281677
    ]
]

Utilizzare LLM per ottenere una risposta RAG

Convertire i documenti recuperati in un formato stringa.

context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)

Definire i prompt del sistema e dell'utente per il Lanage Model. Questo prompt viene assemblato con i documenti recuperati da Milvus.

SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

Utilizzare OpenAI ChatGPT per generare una risposta basata sui prompt.

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
    ],
)
print(response.choices[0].message.content)
The main components of a LLM-powered autonomous agent system are the Planning, Memory, and Tool use. 

1. Planning: The agent breaks down large tasks into smaller, manageable subgoals, and can self-reflect and learn from past mistakes, refining its actions for future steps.

2. Memory: This includes short-term memory, which the model uses for in-context learning, and long-term memory, which allows the agent to retain and recall information over extended periods. 

3. Tool use: This component allows the agent to call external APIs for additional information that is not available in the model weights, like current information, code execution capacity, and access to proprietary information sources.

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Questa pagina è stata utile?