🚀 免費嘗試 Zilliz Cloud,完全托管的 Milvus,體驗速度提升 10 倍!立即嘗試

milvus-logo
LFAI
主頁
  • 整合
  • Home
  • Docs
  • 整合

  • 評估與可觀察性

  • 拉格舞曲

使用 Ragas 進行評估

Open In Colab GitHub Repository

本指南展示了如何使用 Ragas 來評估建立在Milvus 之上的檢索-增強生成 (RAG) 管道。

RAG 系統結合了檢索系統與生成模型,可根據給定的提示生成新的文字。該系統首先使用 Milvus 從語料庫中檢索相關文件,然後根據檢索到的文件使用生成模型生成新文本。

Ragas是一個可以幫助您評估 RAG 管道的框架。現有的工具和框架可以幫助您建立這些管道,但是評估它和量化您的管道效能可能很困難。這就是 Ragas (RAG 評估) 的用武之地。

先決條件

在執行本筆記本之前,請確定您已安裝下列依賴項目:

$ pip install --upgrade pymilvus openai requests tqdm pandas ragas

如果您使用的是 Google Colab,為了啟用剛安裝的相依性,您可能需要重新啟動執行時(點選畫面上方的「Runtime」功能表,並從下拉式功能表中選擇「Restart session」)。

在本範例中,我們將使用 OpenAI 作為 LLM。您應該準備api key OPENAI_API_KEY 作為環境變數。

import os

os.environ["OPENAI_API_KEY"] = "sk-***********"

定義 RAG 管道

我們將定義以 Milvus 作為向量儲存、OpenAI 作為 LLM 的 RAG 類。該類包含load 方法 (將文字資料載入 Milvus)、retrieve 方法 (擷取與給定問題最相似的文字資料),以及answer 方法 (使用擷取的知識回答給定問題)。

from typing import List
from tqdm import tqdm
from openai import OpenAI
from pymilvus import MilvusClient


class RAG:
    """
    RAG (Retrieval-Augmented Generation) class built upon OpenAI and Milvus.
    """

    def __init__(self, openai_client: OpenAI, milvus_client: MilvusClient):
        self._prepare_openai(openai_client)
        self._prepare_milvus(milvus_client)

    def _emb_text(self, text: str) -> List[float]:
        return (
            self.openai_client.embeddings.create(input=text, model=self.embedding_model)
            .data[0]
            .embedding
        )

    def _prepare_openai(
        self,
        openai_client: OpenAI,
        embedding_model: str = "text-embedding-3-small",
        llm_model: str = "gpt-3.5-turbo",
    ):
        self.openai_client = openai_client
        self.embedding_model = embedding_model
        self.llm_model = llm_model
        self.SYSTEM_PROMPT = """
Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
"""
        self.USER_PROMPT = """
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.
<context>
{context}
</context>
<question>
{question}
</question>
"""

    def _prepare_milvus(
        self, milvus_client: MilvusClient, collection_name: str = "rag_collection"
    ):
        self.milvus_client = milvus_client
        self.collection_name = collection_name
        if self.milvus_client.has_collection(self.collection_name):
            self.milvus_client.drop_collection(self.collection_name)
        embedding_dim = len(self._emb_text("foo"))
        self.milvus_client.create_collection(
            collection_name=self.collection_name,
            dimension=embedding_dim,
            metric_type="IP",  # Inner product distance
            consistency_level="Strong",  # Strong consistency level
        )

    def load(self, texts: List[str]):
        """
        Load the text data into Milvus.
        """
        data = []
        for i, line in enumerate(tqdm(texts, desc="Creating embeddings")):
            data.append({"id": i, "vector": self._emb_text(line), "text": line})

        self.milvus_client.insert(collection_name=self.collection_name, data=data)

    def retrieve(self, question: str, top_k: int = 3) -> List[str]:
        """
        Retrieve the most similar text data to the given question.
        """
        search_res = self.milvus_client.search(
            collection_name=self.collection_name,
            data=[self._emb_text(question)],
            limit=top_k,
            search_params={"metric_type": "IP", "params": {}},  # Inner product distance
            output_fields=["text"],  # Return the text field
        )
        retrieved_texts = [res["entity"]["text"] for res in search_res[0]]
        return retrieved_texts[:top_k]

    def answer(
        self,
        question: str,
        retrieval_top_k: int = 3,
        return_retrieved_text: bool = False,
    ):
        """
        Answer the given question with the retrieved knowledge.
        """
        retrieved_texts = self.retrieve(question, top_k=retrieval_top_k)
        user_prompt = self.USER_PROMPT.format(
            context="\n".join(retrieved_texts), question=question
        )
        response = self.openai_client.chat.completions.create(
            model=self.llm_model,
            messages=[
                {"role": "system", "content": self.SYSTEM_PROMPT},
                {"role": "user", "content": user_prompt},
            ],
        )
        if not return_retrieved_text:
            return response.choices[0].message.content
        else:
            return response.choices[0].message.content, retrieved_texts

讓我們用 OpenAI 和 Milvus 客戶端初始化 RAG 類別。

openai_client = OpenAI()
milvus_client = MilvusClient(uri="./milvus_demo.db")

my_rag = RAG(openai_client=openai_client, milvus_client=milvus_client)

至於MilvusClient 的參數 :

  • uri 設定為本機檔案,例如./milvus.db ,是最方便的方法,因為它會自動利用Milvus Lite將所有資料儲存在這個檔案中。
  • 如果您有大規模的資料,您可以在docker 或 kubernetes 上架設效能更高的 Milvus 伺服器。在此設定中,請使用伺服器的 uri,例如http://localhost:19530 ,作為您的uri
  • 如果您想使用Zilliz Cloud(Milvus 的完全管理雲端服務),請調整uritoken ,與 Zilliz Cloud 中的Public Endpoint 和 Api key對應。

執行 RAG 管道並獲得結果

我們使用Milvus 開發指南作為 RAG 中的私有知識,這是簡單 RAG 管道的良好資料來源。

下載並載入 RAG 管道。

import os
import urllib.request

url = "https://raw.githubusercontent.com/milvus-io/milvus/master/DEVELOPMENT.md"
file_path = "./Milvus_DEVELOPMENT.md"

if not os.path.exists(file_path):
    urllib.request.urlretrieve(url, file_path)
with open(file_path, "r") as file:
    file_text = file.read()

# We simply use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file.
text_lines = file_text.split("# ")
my_rag.load(text_lines)  # Load the text data into RAG pipeline
Creating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:16<00:00,  2.80it/s]

讓我們定義一個關於開發指南文件內容的查詢問題。然後使用answer 方法取得答案和擷取的上下文文字。

question = "what is the hardware requirements specification if I want to build Milvus and run from source code?"
my_rag.answer(question, return_retrieved_text=True)
('The hardware requirements specification to build and run Milvus from source code is 8GB of RAM and 50GB of free disk space.',
 ['Hardware Requirements\n\nThe following specification (either physical or virtual machine resources) is recommended for Milvus to build and run from source code.\n\n```\n- 8GB of RAM\n- 50GB of free disk space\n```\n\n##',
  'Building Milvus on a local OS/shell environment\n\nThe details below outline the hardware and software requirements for building on Linux and MacOS.\n\n##',
  "Software Requirements\n\nAll Linux distributions are available for Milvus development. However a majority of our contributor worked with Ubuntu or CentOS systems, with a small portion of Mac (both x86_64 and Apple Silicon) contributors. If you would like Milvus to build and run on other distributions, you are more than welcome to file an issue and contribute!\n\nHere's a list of verified OS types where Milvus can successfully build and run:\n\n- Debian/Ubuntu\n- Amazon Linux\n- MacOS (x86_64)\n- MacOS (Apple Silicon)\n\n##"])

現在讓我們準備一些問題與其相對應的地面真實答案。我們從 RAG 管道取得答案和上下文。

from datasets import Dataset
import pandas as pd

question_list = [
    "what is the hardware requirements specification if I want to build Milvus and run from source code?",
    "What is the programming language used to write Knowhere?",
    "What should be ensured before running code coverage?",
]
ground_truth_list = [
    "If you want to build Milvus and run from source code, the recommended hardware requirements specification is:\n\n- 8GB of RAM\n- 50GB of free disk space.",
    "The programming language used to write Knowhere is C++.",
    "Before running code coverage, you should make sure that your code changes are covered by unit tests.",
]
contexts_list = []
answer_list = []
for question in tqdm(question_list, desc="Answering questions"):
    answer, contexts = my_rag.answer(question, return_retrieved_text=True)
    contexts_list.append(contexts)
    answer_list.append(answer)

df = pd.DataFrame(
    {
        "question": question_list,
        "contexts": contexts_list,
        "answer": answer_list,
        "ground_truth": ground_truth_list,
    }
)
rag_results = Dataset.from_pandas(df)
df
Answering questions: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00,  1.29s/it]
問題 上下文 答案 地面真相
0 什麼是硬體需求規格? [Hardware Requirements/硬體需求規格]以下是... 硬體需求規格是什麼? 如果您想建立 Milvus 並從來源執行...
1 用什麼編程語言來寫... [CMake & Conan\n\nMilvus 的演算法函式庫... 用來編寫知乎的程式語言是什麼? 用什麼編程語言來寫Knowher...
2 在運行代碼覆蓋之前應確保哪些... [Code coverage\n/nBefore submitting your pull ... 在執行程式碼覆蓋之前,您應該確保... 在執行程式碼覆蓋之前,您應該確保 ...

使用 Ragas 進行評估

我們使用 Ragas 來評估 RAG 管道結果的效能。

Ragas 提供了一套易於使用的度量指標。我們以Answer relevancy,Faithfulness,Context recall, 和Context precision 作為評估 RAG 管道的指標。有關指標的詳細資訊,請參閱Ragas Metrics

from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)

result = evaluate(
    rag_results,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_recall,
        context_precision,
    ],
)

result
Evaluating:   0%|          | 0/12 [00:00<?, ?it/s]





{'answer_relevancy': 0.9445, 'faithfulness': 1.0000, 'context_recall': 1.0000, 'context_precision': 1.0000}

免費嘗試托管的 Milvus

Zilliz Cloud 無縫接入,由 Milvus 提供動力,速度提升 10 倍。

開始使用
反饋

這個頁面有幫助嗎?