使用 Milvus 和 LangChain 的檢索-增強世代 (RAG)
本指南展示了如何使用 LangChain 和 Milvus 建立一個檢索-增強生成 (RAG) 系統。
RAG 系統結合了檢索系統與生成模型,可根據給定的提示產生新的文字。該系統首先使用 Milvus 從語料庫中檢索相關文件,然後根據檢索到的文件使用生成模型生成新文本。
LangChain是一個由大型語言模型 (LLM) 驅動的應用程式開發框架。Milvus是世界上最先進的開放源碼向量資料庫,用於嵌入相似性搜尋和人工智能應用程式。
先決條件
執行本筆記本之前,請確定您已安裝下列相依性:
pip install --upgrade --quiet langchain langchain-core langchain-community langchain-text-splitters langchain-milvus langchain-openai bs4
如果您使用的是 Google Colab,為了啟用剛安裝的相依性,您可能需要重新啟動執行時(點選畫面頂端的「Runtime」功能表,並從下拉式功能表中選擇「Restart session」)。
我們將使用 OpenAI 的模型。您應該準備api key OPENAI_API_KEY
作為環境變數。
import os
os.environ["OPENAI_API_KEY"] = "sk-***********"
準備資料
我們使用 Langchain WebBaseLoader 從網頁來源載入文件,並使用 RecursiveCharacterTextSplitter 將文件分割成小塊。
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Create a WebBaseLoader instance to load documents from web sources
loader = WebBaseLoader(
web_paths=(
"https://lilianweng.github.io/posts/2023-06-23-agent/",
"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
# Load documents from web sources using the loader
documents = loader.load()
# Initialize a RecursiveCharacterTextSplitter for splitting text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
# Split the documents into chunks using the text_splitter
docs = text_splitter.split_documents(documents)
# Let's take a look at the first document
docs[1]
Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'})
我們可以看到,文件已經被分割成幾個區塊。而資料的內容是關於 AI 代理的。
使用 Milvus 向量儲存建立 RAG 鏈
我們將以文件初始化一個 Milvus 向量儲存庫,將文件載入 Milvus 向量儲存庫,並在引擎蓋下建立索引。
from langchain_milvus import Milvus, Zilliz
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vectorstore = Milvus.from_documents( # or Zilliz.from_documents
documents=docs,
embedding=embeddings,
connection_args={
"uri": "./milvus_demo.db",
},
drop_old=True, # Drop the old Milvus collection if it exists
)
對於connection_args
:
- 將
uri
設定為本機檔案,例如./milvus.db
,是最方便的方法,因為它會自動利用Milvus Lite將所有資料儲存在這個檔案中。 - 如果您有大規模的資料,您可以在docker 或 kubernetes 上架設效能更高的 Milvus 伺服器。在此設定中,請使用伺服器的 uri,例如
http://localhost:19530
,作為您的uri
。 - 如果您想使用Zilliz Cloud,Milvus 的完全管理雲端服務,請將
Milvus.from_documents
改為Zilliz.from_documents
,並調整uri
和token
,分別對應 Zilliz Cloud 的Public Endpoint 和 Api key。
使用測試查詢問題搜尋 Milvus 向量儲存庫中的文件。讓我們來看看最頂端的 1 個文件。
query = "What is self-reflection of an AI Agent?"
vectorstore.similarity_search(query, k=1)
[Document(page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555859})]
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
# Initialize the OpenAI language model for response generation
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# Define the prompt template for generating AI responses
PROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>
<question>
{question}
</question>
The response should be specific and use statistics or numbers when possible.
Assistant:"""
# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
# Convert the vector store to a retriever
retriever = vectorstore.as_retriever()
# Define a function to format the retrieved documents
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
使用 LCEL(LangChain Expression Language) 建立 RAG 鏈。
# Define the RAG (Retrieval-Augmented Generation) chain for AI response generation
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# rag_chain.get_graph().print_ascii()
# Invoke the RAG chain with a specific question and retrieve the response
res = rag_chain.invoke(query)
res
"Self-reflection of an AI agent involves the process of synthesizing memories into higher-level inferences over time to guide the agent's future behavior. It serves as a mechanism to create higher-level summaries of past events. One approach to self-reflection involves prompting the language model with the 100 most recent observations and asking it to generate the 3 most salient high-level questions based on those observations. This process helps the AI agent optimize believability in the current moment and over time."
恭喜您!您已經建立了一個由 Milvus 和 LangChain 驅動的基本 RAG 鏈。
元資料過濾
我們可以使用Milvus Scalar 過濾規則來根據 metadata 過濾文件。我們從兩個不同的來源載入了文件,我們可以根據元資料來過濾文件source
。
vectorstore.similarity_search(
"What is CoT?",
k=1,
expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
)
# The same as:
# vectorstore.as_retriever(search_kwargs=dict(
# k=1,
# expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
# )).invoke("What is CoT?")
[Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555858})]
如果我們想要動態地變更搜尋參數,而不需要重建鏈,我們可以設定 runtime chain internals。讓我們用這個動態配置定義一個新的 retriever,然後用它來建立一個新的 RAG 鏈。
from langchain_core.runnables import ConfigurableField
# Define a new retriever with a configurable field for search_kwargs
retriever2 = vectorstore.as_retriever().configurable_fields(
search_kwargs=ConfigurableField(
id="retriever_search_kwargs",
)
)
# Invoke the retriever with a specific search_kwargs which filter the documents by source
retriever2.with_config(
configurable={
"retriever_search_kwargs": dict(
expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
k=1,
)
}
).invoke(query)
[Document(page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555859})]
# Define a new RAG chain with this dynamically configurable retriever
rag_chain2 = (
{"context": retriever2 | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
讓我們用不同的篩選條件試試這個動態配置的 RAG 鏈。
# Invoke this RAG chain with a specific question and config
rag_chain2.with_config(
configurable={
"retriever_search_kwargs": dict(
expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
)
}
).invoke(query)
"Self-reflection of an AI agent involves the process of synthesizing memories into higher-level inferences over time to guide the agent's future behavior. It serves as a mechanism to create higher-level summaries of past events. One approach to self-reflection involves prompting the language model with the 100 most recent observations and asking it to generate the 3 most salient high-level questions based on those observations. This process helps the AI agent optimize believability in the current moment and over time."
當我們改變搜尋條件,以第二個來源過濾文件時,由於這個部落格來源的內容與查詢問題毫無關係,因此我們得到一個沒有相關資訊的答案。
rag_chain2.with_config(
configurable={
"retriever_search_kwargs": dict(
expr="source == 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'",
)
}
).invoke(query)
"I'm sorry, but based on the provided context, there is no specific information or statistical data available regarding the self-reflection of an AI agent."
本教學著重於 Milvus LangChain 整合的基本用法和簡單的 RAG 方法。如需更進階的 RAG 技術,請參考進階 RAG Bootcamp。