• Tutorials

Build RAG with Milvus

Open In Colab

In this tutorial, we will show you how to build a RAG(Retrieval-Augmented Generation) pipeline with Milvus.

The RAG system combines a retrieval system with a generative model to generate new text based on a given prompt. The system first retrieves relevant documents from a corpus using a vector similarity search engine like Milvus, and then uses a generative model to generate new text based on the retrieved documents.


Dependencies and Environment

$ pip install --upgrade pymilvus openai requests tqdm

If you are using Google Colab, to enable dependencies just installed, you may need to restart the runtime. (Click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).

We will use OpenAI as the LLM in this example. You should prepare the api key OPENAI_API_KEY as an environment variable.

import os

os.environ["OPENAI_API_KEY"] = "sk-***********"

Prepare the data

We use the Milvus development guide to be as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.

Download it and save it as a local text file.

import json
import urllib.request

url = ""
file_path = "./"

if not os.path.exists(file_path):
    urllib.request.urlretrieve(url, file_path)

We simply use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file.

with open(file_path, "r") as file:
    file_text =

text_lines = file_text.split("# ")

Prepare the Embedding Model

We initialize the OpenAI client to prepare the embedding model.

from openai import OpenAI

openai_client = OpenAI()

Define a function to generate text embeddings using OpenAI client. We use the text-embedding-3-small model as an example.

def emb_text(text):
    return (
        openai_client.embeddings.create(input=text, model="text-embedding-3-small")

Generate a test embedding and print its dimension and first few elements.

test_embedding = emb_text("This is a test")
embedding_dim = len(test_embedding)
[0.009907577186822891, -0.0055520725436508656, 0.006800490897148848, -0.0380667969584465, -0.018235687166452408, -0.04122573509812355, -0.007634099572896957, 0.03221159428358078, 0.0189057644456625, 9.491520904703066e-05]

Load data into Milvus

Create the Collection

from pymilvus import MilvusClient

milvus_client = MilvusClient("./milvus_demo.db")

collection_name = "my_rag_collection"

Check if the collection already exists and drop it if it does.

if milvus_client.has_collection(collection_name):

Create a new collection with specified parameters.

If we don't specify any field information, Milvus will automatically create a default id field for primary key, and a vector field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values.

    metric_type="IP",  # Inner product distance
    consistency_level="Strong",  # Strong consistency level

Insert data

Iterate through the text lines, create embeddings, and then insert the data into Milvus.

Here is a new field text, which is a non-defined field in the collection schema. It will be automatically added to the reserved JSON dynamic field, which can be treated as a normal field at a high level.

from tqdm import tqdm

data = []

for i, line in enumerate(tqdm(text_lines, desc="Creating embeddings")):
    data.append({"id": i, "vector": emb_text(line), "text": line})

milvus_client.insert(collection_name=collection_name, data=data)
Creating embeddings: 100%|█| 47/47 [00:16<00:00,  

{'insert_count': 47,
 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46],
 'cost': 0}

Build RAG

Retrieve data for a query

Let's define a query question about the content of the development guide documentation.

question = "what is the hardware requirements specification if I want to build Milvus and run from source code?"

Search for the question in the collection and retrieve the semantic top-3 matches.

search_res =
    ],  # Use the `emb_text` function to convert the question to an embedding vector
    limit=3,  # Return top 3 results
    search_params={"metric_type": "IP", "params": {}},  # Inner product distance
    output_fields=["text"],  # Return the text field

Let's take a look at the search results of the query

retrieved_lines_with_distances = [
    (res["entity"]["text"], res["distance"]) for res in search_res[0]
print(json.dumps(retrieved_lines_with_distances, indent=4))
        "Hardware Requirements\n\nThe following specification (either physical or virtual machine resources) is recommended for Milvus to build and run from source code.\n\n```\n- 8GB of RAM\n- 50GB of free disk space\n```\n\n##",
        "Building Milvus on a local OS/shell environment\n\nThe details below outline the hardware and software requirements for building on Linux and MacOS.\n\n##",
        "Software Requirements\n\nAll Linux distributions are available for Milvus development. However a majority of our contributor worked with Ubuntu or CentOS systems, with a small portion of Mac (both x86_64 and Apple Silicon) contributors. If you would like Milvus to build and run on other distributions, you are more than welcome to file an issue and contribute!\n\nHere's a list of verified OS types where Milvus can successfully build and run:\n\n- Debian/Ubuntu\n- Amazon Linux\n- MacOS (x86_64)\n- MacOS (Apple Silicon)\n\n##",

Use LLM to get a RAG response

Convert the retrieved documents into a string format.

context = "\n".join(
    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]

Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus.

Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.
Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.

Use OpenAI ChatGPT to generate a response based on the prompts.

response =
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": USER_PROMPT},
The hardware requirements specification for building Milvus and running it from the source code are as follows:

- 8GB of RAM
- 50GB of free disk space


Was this page helpful?