使用 Milvus 和 Ollama 建立 RAG
Ollama是一個開放原始碼平台,可簡化大型語言模型 (LLM) 的本機執行與客製化。它提供使用者友善的免雲端體驗,無需進階技術即可輕鬆進行模型下載、安裝與互動。透過不斷增加的預訓 LLM 資料庫 (從一般用途到特定領域),Ollama 可輕鬆管理和自訂各種應用程式的模型。它能確保資料隱私和彈性,讓使用者能夠完全在自己的機器上微調、最佳化和部署 AI 驅動的解決方案。
在本指南中,我們將告訴您如何利用 Ollama 和 Milvus 高效、安全地建立 RAG(Retrieval-Augmented Generation)管道。
準備工作
相依性與環境
$ pip install pymilvus ollama
如果您使用的是 Google Colab,為了啟用剛安裝的相依性,您可能需要重新啟動執行時(點選畫面上方的「Runtime」功能表,並從下拉式功能表中選擇「Restart session」)。
準備資料
我們使用Milvus 文件 2.4.x中的常見問題頁面作為 RAG 中的私有知識,對於簡單的 RAG 管道而言,這是一個很好的資料來源。
下載 zip 檔案並解壓縮文件到資料夾milvus_docs
。
$ wget https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip
$ unzip -q milvus_docs_2.4.x_en.zip -d milvus_docs
--2024-11-26 21:47:19-- https://github.com/milvus-io/milvus-docs/releases/download/v2.4.6-preview/milvus_docs_2.4.x_en.zip
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/267273319/c52902a0-e13c-4ca7-92e0-086751098a05?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241127T024720Z&X-Amz-Expires=300&X-Amz-Signature=7808b77cbdaa7e122196bcd75a73f29f2540333a350c4830bbdf5f286e876304&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dmilvus_docs_2.4.x_en.zip&response-content-type=application%2Foctet-stream [following]
--2024-11-26 21:47:20-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/267273319/c52902a0-e13c-4ca7-92e0-086751098a05?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241127%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241127T024720Z&X-Amz-Expires=300&X-Amz-Signature=7808b77cbdaa7e122196bcd75a73f29f2540333a350c4830bbdf5f286e876304&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dmilvus_docs_2.4.x_en.zip&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 613094 (599K) [application/octet-stream]
Saving to: ‘milvus_docs_2.4.x_en.zip’
milvus_docs_2.4.x_e 100%[===================>] 598.72K 1.20MB/s in 0.5s
2024-11-26 21:47:20 (1.20 MB/s) - ‘milvus_docs_2.4.x_en.zip’ saved [613094/613094]
我們從資料夾milvus_docs/en/faq
載入所有 markdown 檔案。對於每個文件,我們只需簡單地使用「#」來分隔文件中的內容,就可以大致分隔出 markdown 檔案中每個主要部分的內容。
from glob import glob
text_lines = []
for file_path in glob("milvus_docs/en/faq/*.md", recursive=True):
with open(file_path, "r") as file:
file_text = file.read()
text_lines += file_text.split("# ")
準備 LLM 和嵌入模型
Ollama 支援多種模型,可同時進行以 LLM 為基礎的任務和嵌入生成,讓您可以輕鬆開發檢索-增強生成 (RAG) 應用程式。對於這個設定:
- 我們將使用Llama 3.2 (3B)作為文本生成任務的 LLM。
- 對於嵌入生成,我們將使用mxbai-embed-large,這是一個 334M 參數的模型,已針對語意相似性進行最佳化。
在開始之前,請確保兩個模型都已拉到本機:
! ollama pull mxbai-embed-large
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest
pulling 819c2adf5ce6... 100% ▕████████████████▏ 669 MB
pulling c71d239df917... 100% ▕████████████████▏ 11 KB
pulling b837481ff855... 100% ▕████████████████▏ 16 B
pulling 38badd946f91... 100% ▕████████████████▏ 408 B
verifying sha256 digest
writing manifest
success [?25h
! ollama pull llama3.2
[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest
pulling dde5aa3fc5ff... 100% ▕████████████████▏ 2.0 GB
pulling 966de95ca8a6... 100% ▕████████████████▏ 1.4 KB
pulling fcc5a6bec9da... 100% ▕████████████████▏ 7.7 KB
pulling a70ff7e570d9... 100% ▕████████████████▏ 6.0 KB
pulling 56bb8bd477a5... 100% ▕████████████████▏ 96 B
pulling 34bb5ab01051... 100% ▕████████████████▏ 561 B
verifying sha256 digest
writing manifest
success [?25h
準備好這些模型後,我們就可以開始實作 LLM 驅動的生成和基於嵌入的檢索工作流程。
import ollama
def emb_text(text):
response = ollama.embeddings(model="mxbai-embed-large", prompt=text)
return response["embedding"]
產生測試嵌入,並列印其維度和前幾個元素。
test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
print(embedding_dim)
print(test_embedding[:10])
1024
[0.23276396095752716, 0.4257211685180664, 0.19724100828170776, 0.46120673418045044, -0.46039995551109314, -0.1413791924715042, -0.18261606991291046, -0.07602324336767197, 0.39991313219070435, 0.8337644338607788]
將資料載入 Milvus
建立集合
from pymilvus import MilvusClient
milvus_client = MilvusClient(uri="./milvus_demo.db")
collection_name = "my_rag_collection"
至於MilvusClient
的參數:
- 將
uri
設定為本機檔案,例如./milvus.db
,是最方便的方法,因為它會自動利用Milvus Lite將所有資料儲存在此檔案中。 - 如果您有大規模的資料,您可以在docker 或 kubernetes 上架設效能更高的 Milvus 伺服器。在此設定中,請使用伺服器的 uri,例如
http://localhost:19530
,作為您的uri
。 - 如果您想使用Zilliz Cloud(Milvus 的完全管理雲端服務),請調整
uri
和token
,與 Zilliz Cloud 的Public Endpoint 和 Api key對應。
檢查集合是否已經存在,如果已經存在,請將其刪除。
if milvus_client.has_collection(collection_name):
milvus_client.drop_collection(collection_name)
使用指定的參數建立新的集合。
如果我們沒有指定任何欄位資訊,Milvus 會自動建立一個預設的id
欄位做為主索引鍵,以及一個vector
欄位來儲存向量資料。保留的 JSON 欄位用來儲存非結構描述定義的欄位及其值。
milvus_client.create_collection(
collection_name=collection_name,
dimension=embedding_dim,
metric_type="IP", # Inner product distance
consistency_level="Strong", # Strong consistency level
)
插入資料
遍歷文字行,建立嵌入,然後將資料插入 Milvus。
這裡有一個新欄位text
,它是集合模式中的非定義欄位。它會自動加入保留的 JSON 動態欄位,在高層次上可視為一般欄位。
from tqdm import tqdm
data = []
for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
data.append({"id": i, "vector": emb_text(line), "text": line})
milvus_client.insert(collection_name=collection_name, data=data)
Creating embeddings: 100%|██████████| 72/72 [00:03<00:00, 22.56it/s]
{'insert_count': 72, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71], 'cost': 0}
建立 RAG
為查詢擷取資料
讓我們指定一個關於 Milvus 的常見問題。
question = "How is data stored in milvus?"
在資料集中搜尋該問題,並擷取語義上前三名的符合資料。
search_res = milvus_client.search(
collection_name=collection_name,
data=[
emb_text(question)
], # Use the `emb_text` function to convert the question to an embedding vector
limit=3, # Return top 3 results
search_params={"metric_type": "IP", "params": {}}, # Inner product distance
output_fields=["text"], # Return the text field
)
讓我們來看看查詢的搜尋結果
import json
retrieved_lines_with_distances = [
(res["entity"]["text"], res["distance"]) for res in search_res[0]
]
print(json.dumps(retrieved_lines_with_distances, indent=4))
[
[
" Where does Milvus store data?\n\nMilvus deals with two types of data, inserted data and metadata. \n\nInserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).\n\nMetadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.\n\n###",
231.9398193359375
],
[
"How does Milvus flush data?\n\nMilvus returns success when inserted data are loaded to the message queue. However, the data are not yet flushed to the disk. Then Milvus' data node writes the data in the message queue to persistent storage as incremental logs. If `flush()` is called, the data node is forced to write all data in the message queue to persistent storage immediately.\n\n###",
226.48316955566406
],
[
"What is the maximum dataset size Milvus can handle?\n\n \nTheoretically, the maximum dataset size Milvus can handle is determined by the hardware it is run on, specifically system memory and storage:\n\n- Milvus loads all specified collections and partitions into memory before running queries. Therefore, memory size determines the maximum amount of data Milvus can query.\n- When new entities and and collection-related schema (currently only MinIO is supported for data persistence) are added to Milvus, system storage determines the maximum allowable size of inserted data.\n\n###",
210.60745239257812
]
]
使用 LLM 獲得 RAG 回應
將擷取的文件轉換成字串格式。
context = "\n".join(
[line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]
)
定義 Lanage Model 的系統和使用者提示。此提示與從 Milvus 擷取的文件組合。
SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
USER_PROMPT = f"""
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""
使用 Ollama 提供的llama3.2
模型,根據提示產生回應。
from ollama import chat
from ollama import ChatResponse
response: ChatResponse = chat(
model="llama3.2",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT},
],
)
print(response["message"]["content"])
According to the provided context, data in Milvus is stored in two types:
1. **Inserted data**: Storing data in persistent storage as incremental log. It supports multiple object storage backends such as MinIO, AWS S3, Google Cloud Storage (GCS), Azure Blob Storage, Alibaba Cloud OSS, and Tencent Cloud Object Storage.
2. **Metadata**: Generated within Milvus and stored in etcd.
太好了!我們已經用 Milvus 和 Ollama 成功地建立了一個 RAG 管道。