We are happy to announce the launch of Milvus 2.4, a major advancement in enhancing search capabilities for large-scale datasets. This latest release adds new features, such as support for the GPU-based CAGRA index, beta support for [sparse embeddings](https://zilliz.com/learn/sparse-and-dense-embeddings), group search, and various other improvements in search capabilities. These developments reinforce our commitment to the community by offering developers like you a powerful and efficient tool for handling and querying vector data. Let's jump into the key benefits of Milvus 2.4 together.


## Enabled Multi-vector Search for Simplified Multimodal Searches

Milvus 2.4 provides multivector search capability, allowing simultaneous search and reranking of different vector types within the same Milvus system. This feature streamlines multimodal searches, significantly enhancing recall rates and enabling developers to effortlessly manage intricate AI applications with varied data types. Additionally, this functionality simplifies the integration and fine-tuning of custom reranking models, aiding in the creation of advanced search functions like precise [recommender systems](https://zilliz.com/vector-database-use-cases/recommender-system) that utilize insights from multidimensional data.

![How the Milti-Vector Search Feature Works](https://assets.zilliz.com/How_the_multi_vector_search_feature_works_6c85961349.png)

Multivector support in Milvus has two components: 

1. The ability to store/query multiple vectors for a single entity within a collection, which is a more natural way to organize data 

2. The ability to build/optimize a reranking algorithm by leveraging the prebuilt reranking algorithms in Milvus

Besides being a highly [requested feature](https://github.com/milvus-io/milvus/issues/25639), we built this capability because the industry is moving towards multimodal models with the release of GPT-4 and Claude 3. Reranking is a commonly used technique to further improve query performance in search. We aimed to make it easy for developers to build and optimize their rerankers within the Milvus ecosystem.


## Grouping Search Support for Enhanced Compute Efficiency

Grouping Search is another often [requested feature](https://github.com/milvus-io/milvus/issues/25343) we added to Milvus 2.4. It integrates a group-by operation designed for fields of types BOOL, INT, or VARCHAR, filling a crucial efficiency gap in executing large-scale grouping queries. 

Traditionally, developers relied on extensive Top-K searches followed by manual post-processing to distill group-specific results, a compute-intensive and code-heavy method. Grouping Search refines this process by efficiently linking query outcomes to aggregate group identifiers like document or video names, streamlining the handling of segmented entities within larger datasets.

Milvus distinguishes its Grouping Search with an iterator-based implementation, offering a marked improvement in computational efficiency over similar technologies. This choice ensures superior performance scalability, particularly in production environments where compute resource optimization is paramount. By reducing data traversal and computation overhead, Milvus supports more efficient query processing, significantly reducing response times and operational costs compared to other vector databases. 

Grouping Search bolsters Milvus's capability to manage high-volume, complex queries and aligns with high-performance computing practices for robust data management solutions.

## Beta Support for Sparse Vector Embeddings

[Sparse embeddings](https://zilliz.com/learn/sparse-and-dense-embeddings) represent a paradigm shift from traditional dense vector approaches, catering to the nuances of semantic similarity rather than mere keyword frequency. This distinction allows for a more nuanced search capability, aligning closely with the semantic content of the query and the documents. Sparse vector models, particularly useful in information retrieval and natural language processing, offer powerful out-of-domain search capabilities and interpretability compared to their dense counterparts.

In Milvus 2.4, we have expanded the Hybrid Search to include sparse embeddings generated by advanced neural models like SPLADEv2 or statistical models such as BM25. In Milvus, sparse vectors are treated on par with dense vectors, enabling the creation of collections with sparse vector fields, data insertion, index building, and performing similarity searches. Notably, sparse embeddings in Milvus support the [Inner Product](https://zilliz.com/blog/similarity-metrics-for-vector-search#Inner-Product) (IP) distance metric, which is advantageous given their high-dimensional nature, making other metrics less effective. This functionality also supports data types with a dimension as an unsigned 32-bit integer and a 32-bit float for the value, thus facilitating a broad spectrum of applications, from nuanced text searches to elaborate [information retrieval](https://zilliz.com/learn/information-retrieval-metrics) systems.

With this new feature, Milvus allows for hybrid search methodologies that meld keyword and embedding-based techniques, offering a seamless transition for users moving from keyword-centric search frameworks seeking a comprehensive, low-maintenance solution.

We are labeling this feature as “Beta” to continue our performance testing of the feature and gather feedback from the community. The general availability (GA) of sparse vector support is anticipated with the release of Milvus 3.0.


## CAGRA Index Support for Advanced GPU-Accelerated Graph Indexing

Developed by NVIDIA, [CAGRA](https://arxiv.org/abs/2308.15136) (Cuda Anns GRAph-based) is a GPU-based graph indexing technology that significantly surpasses traditional CPU-based methods like the HNSW index in efficiency and performance, especially in high-throughput environments.

With the introduction of the CAGRA Index, Milvus 2.4 provides enhanced GPU-accelerated graph indexing capability. This enhancement is ideal for building similarity search applications requiring minimal latency. Additionally, Milvus 2.4 integrates a brute-force search with the CAGRA index to achieve maximum recall rates in applications. For detailed insights, explore the [introduction blog on CAGRA](https://zilliz.com/blog/Milvus-introduces-GPU-index-CAGRA).

![Milvus Raft CAGRA vs. Milvus HNSW](https://assets.zilliz.com/Milvus_raft_cagra_vs_milvus_hnsw_ffe0415ff5.png)

## Additional Enhancements and Features

Milvus 2.4 also includes other key enhancements, such as Regular Expression support for enhanced substring matching in [metadata filtering](https://zilliz.com/blog/metadata-filtering-with-zilliz-cloud-pipelines), a new scalar inverted index for efficient scalar data type filtering, and a Change Data Capture tool for monitoring and replicating changes in Milvus collections. These updates collectively enhance Milvus's performance and versatility, making it a comprehensive solution for complex data operations.

For more details, see [Milvus 2.4 documentation](https://milvus.io/docs/release_notes.md). 


## Stay Connected!

Excited to learn more about Milvus 2.4? [Join our upcoming webinar](https://zilliz.com/event/unlocking-advanced-search-capabilities-milvus) with James Luan, Zilliz’s VP of Engineering, for an in-depth discussion on the capabilities of this latest release. If you have questions or feedback, join our [Discord channel](https://discord.com/invite/8uyFbECzPX) to engage with our engineers and community members. Don’t forget to follow us on [Twitter](https://twitter.com/milvusio) or [LinkedIn](https://www.linkedin.com/company/the-milvus-project) for the latest news and updates about Milvus.


Milvus 2.4 Supports Multi-vector Search, Sparse Vector, CAGRA, and More!

Unveiling Milvus 2.4: Multi-vector Search, Sparse Vector, CAGRA Index, and More!


![](https://assets.zilliz.com/2_72e444c8dc.JPG)

We are excited to introduce [Milvus Lite](https://milvus.io/docs/milvus_lite.md), a lightweight vector database that runs locally within your Python application. Based on the popular open-source [Milvus](https://milvus.io/intro) vector database, Milvus Lite reuses the core components for vector indexing and query parsing while removing elements designed for high scalability in distributed systems. This design makes a compact and efficient solution ideal for environments with limited computing resources, such as laptops, Jupyter Notebooks, and mobile or edge devices.

Milvus Lite integrates with various AI development stacks like LangChain and LlamaIndex, enabling its use as a vector store in Retrieval Augmented Generation (RAG) pipelines without the need for server setup. Simply run `pip install pymilvus` (version 2.4.3 or above) to incorporate it into your AI application as a Python library.

Milvus Lite shares the Milvus API, ensuring that your client-side code works for both small-scale local deployments and Milvus servers deployed on Docker or Kubernetes with billions of vectors.

## Why We Built Milvus Lite

Many AI applications require vector similarity search for unstructured data, including text, images, voices, and videos, for applications such as chatbots and shopping assistants. Vector databases are crafted for storing and searching vector embeddings and are a crucial part of the AI development stack, particularly for generative AI use cases like [Retrieval Augmented Generation (RAG)](https://zilliz.com/learn/Retrieval-Augmented-Generation).

Despite the availability of numerous vector search solutions, an easy-to-start option that also works for large-scale production deployments was missing. As the creators of Milvus, we designed Milvus Lite to help AI developers build applications faster while ensuring a consistent experience across various deployment options, including Milvus on Kubernetes, Docker, and managed cloud services.

Milvus Lite is a crucial addition to our suite of offerings within the Milvus ecosystem. It provides developers with a versatile tool that supports every stage of their development journey. From prototyping to production environments and from edge computing to large-scale deployments, Milvus is now the only vector database that covers use cases of any size and all stages of development.

## How Milvus Lite Works 

Milvus Lite supports all the basic operations available in Milvus, such as creating collections and inserting, searching, and deleting vectors. It will soon support advanced features like hybrid search. Milvus Lite loads data into memory for efficient searches and persists it as an SQLite file.

Milvus Lite is included in the [Python SDK of Milvus](https://github.com/milvus-io/pymilvus) and can be deployed with a simple `pip install pymilvus`. The following code snippet demonstrates how to set up a vector database with Milvus Lite by specifying a local file name and then creating a new collection. For those familiar with the Milvus API, the only difference is that the `uri` refers to a local file name instead of a network endpoint, e.g., `"milvus_demo.db"` instead of `"http://localhost:19530"` for a Milvus server. Everything else remains the same. Milvus Lite also supports storing raw text and other labels as metadata, using a dynamic or explicitly defined schema, as shown below.

```
from pymilvus import MilvusClient

client = MilvusClient("milvus_demo.db")
# This collection can take input with mandatory fields named "id", "vector" and
# any other fields as "dynamic schema". You can also define the schema explicitly.
client.create_collection(
    collection_name="demo_collection",
    dimension=384  # Dimension for vectors.
)
```
For scalability, an AI application developed with Milvus Lite can easily transition to using Milvus deployed on Docker or Kubernetes by simply specifying the `uri` with the server endpoint.


## Integration with AI Development Stack

In addition to introducing Milvus Lite to make vector search easy to start with, Milvus also integrates with many frameworks and providers of the AI development stack, including [LangChain](https://python.langchain.com/v0.2/docs/integrations/vectorstores/milvus/), [LlamaIndex](https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo/), [Haystack](https://haystack.deepset.ai/integrations/milvus-document-store), [Voyage AI](https://blog.voyageai.com/2024/05/30/semantic-search-with-milvus-lite-and-voyage-ai/), [Ragas](https://milvus.io/docs/integrate_with_ragas.md), [Jina AI](https://jina.ai/news/implementing-a-chat-history-rag-with-jina-ai-and-milvus-lite/), [DSPy](https://dspy-docs.vercel.app/docs/deep-dive/retrieval_models_clients/MilvusRM), [BentoML](https://www.bentoml.com/blog/building-a-rag-app-with-bentocloud-and-milvus-lite), [WhyHow](https://chiajy.medium.com/70873c7576f1), [Relari AI](https://blog.relari.ai/case-study-using-synthetic-data-to-benchmark-rag-systems-be324904ace1), [Airbyte](https://docs.airbyte.com/integrations/destinations/milvus), [HuggingFace](https://milvus.io/docs/integrate_with_hugging-face.md) and [MemGPT](https://memgpt.readme.io/docs/storage#milvus). Thanks to their extensive tooling and services, these integrations simplify the development of AI applications with vector search capability.

And this is just the beginning—many more exciting integrations are coming soon! Stay tuned! 


## More Resources and Examples

Explore [Milvus quickstart documentation](https://milvus.io/docs/quickstart.md) for detailed guides and code examples on using Milvus Lite to build AI applications like Retrieval-Augmented Generation ([RAG](https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/build_RAG_with_milvus.ipynb)) and [image search](https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/quickstart/image_search_with_milvus.ipynb).

Milvus Lite is an open-source project, and we welcome your contributions. Check out our [Contributing Guide](https://github.com/milvus-io/milvus-lite/blob/main/CONTRIBUTING.md) to get started. You can also report bugs or request features by filing an issue on the [Milvus Lite GitHub](https://github.com/milvus-io/milvus-lite) repository.


Introducing Milvus Lite: Start Building a GenAI Application in Seconds


[Milvus](https://milvus.io/intro) is an open-source vector database designed specifically for AI applications. Whether you're working on machine learning, deep learning, or any other AI-related project, Milvus offers a robust and efficient way to handle large-scale vector data.

Now, with the [model module integration](https://milvus.io/docs/embeddings.md) in PyMilvus, the Python SDK for Milvus, it's even easier to add Embedding and Reranking models. This integration simplifies transforming your data into searchable vectors or reranking results for more accurate outcomes, such as in [Retrieval Augmented Generation (RAG)](https://zilliz.com/learn/Retrieval-Augmented-Generation).

In this blog, we will review dense embedding models, sparse embedding models, and re-rankers and demonstrate how to use them in practice using [Milvus Lite](https://milvus.io/blog/introducing-milvus-lite.md), a lightweight version of Milvus that can run locally in your Python applications. 


## Dense vs Sparse Embeddings 

Before we walk you through how to use our integrations, let’s look at two main categories of vector embeddings. 

[Vector Embeddings](https://zilliz.com/glossary/vector-embeddings) generally fall into two main categories: [**Dense Embeddings** and **Sparse Embeddings**](https://zilliz.com/learn/sparse-and-dense-embeddings).

- Dense Embeddings are high-dimensional vectors in which most or all elements are non-zero, making them ideal for encoding text semantics or fuzzy meaning.

- Sparse Embeddings are high-dimensional vectors with many zero elements, better suited for encoding exact or adjacent concepts.

Milvus supports both types of embeddings and offers hybrid search. [Hybrid Search](https://zilliz.com/blog/hybrid-search-with-milvus) allows you to conduct searches across various vector fields within the same collection. These vectors can represent different facets of data, use diverse embedding models, or employ distinct data processing methods, combining the results using re-rankers.

## How to Use Our Embedding and Reranking Integrations

In the following sections, we’ll demonstrate three practical examples of using our integrations to generate embeddings and conduct vector searches. 

### Example 1: Use the Default Embedding Function to Generate Dense Vectors 

You must install the `pymilvus` client with the `model` package to use embedding and reranking functions with Milvus. 

```
pip install "pymilvus[model]"
```

This step will install [Milvus Lite](https://milvus.io/docs/quickstart.md), allowing you to run Milvus locally within your Python application. It also includes the model subpackage, which includes all utilities for Embedding and reranking.

The model subpackage supports various embedding models, including those from OpenAI, [Sentence Transformers](https://zilliz.com/learn/Sentence-Transformers-for-Long-Form-Text), [BGE-M3](https://zilliz.com/learn/bge-m3-and-splade-two-machine-learning-models-for-generating-sparse-embeddings), BM25, [SPLADE](https://zilliz.com/learn/bge-m3-and-splade-two-machine-learning-models-for-generating-sparse-embeddings), and Jina AI pre-trained models.

This example uses the `DefaultEmbeddingFunction`, based on the `all-MiniLM-L6-v2` Sentence Transformer model for simplicity. The model is about 70MB and will be downloaded during the first use:

```
from pymilvus import model

# This will download "all-MiniLM-L6-v2", a lightweight model.
ef = model.DefaultEmbeddingFunction()

# Data from which embeddings are to be generated
docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England.",
]

embeddings = ef.encode_documents(docs)

print("Embeddings:", embeddings)
# Print dimension and shape of embeddings
print("Dim:", ef.dim, embeddings[0].shape)
```

The expected output should be something like the following: 

```
Embeddings: [array([-3.09392996e-02, -1.80662833e-02,  1.34775648e-02,  2.77156215e-02,
      -4.86349640e-03, -3.12581174e-02, -3.55921760e-02,  5.76934684e-03,
       2.80773244e-03,  1.35783911e-01,  3.59678417e-02,  6.17732145e-02,
...
      -4.61330153e-02, -4.85207550e-02,  3.13997865e-02,  7.82178566e-02,
      -4.75336798e-02,  5.21207601e-02,  9.04406682e-02, -5.36676683e-02],
     dtype=float32)]
Dim: 384 (384,)
```


### Example 2: Generate Sparse Vectors Using The BM25 Model

BM25 is a well-known method that uses word occurrence frequencies to determine the relevance between queries and documents. In this example, we’ll show how to use `BM25EmbeddingFunction` to generate sparse embeddings for queries and documents.

In BM25, it's important to calculate the statistics in your documents to obtain the IDF (Inverse Document Frequency), which can represent the patterns in your documents. The IDF measures how much information a word provides, whether it's common or rare across all documents.

```
from pymilvus.model.sparse import BM25EmbeddingFunction

# 1. Prepare a small corpus to search
docs = [
   "Artificial intelligence was founded as an academic discipline in 1956.",
   "Alan Turing was the first person to conduct substantial research in AI.",
   "Born in Maida Vale, London, Turing was raised in southern England.",
]
query = "Where was Turing born?"
bm25_ef = BM25EmbeddingFunction()

# 2. Fit the corpus to get BM25 model parameters on your documents.
bm25_ef.fit(docs)

# 3. Store the fitted parameters to expedite future processing.
bm25_ef.save("bm25_params.json")

# 4. Load the saved params
new_bm25_ef = BM25EmbeddingFunction()
new_bm25_ef.load("bm25_params.json")

docs_embeddings = new_bm25_ef.encode_documents(docs)
query_embeddings = new_bm25_ef.encode_queries([query])
print("Dim:", new_bm25_ef.dim, list(docs_embeddings)[0].shape)
```

### Example 3: Using a ReRanker 

A search system aims to find the most relevant results quickly and efficiently. Traditionally, methods like BM25 or TF-IDF have been used to rank search results based on keyword matching. Recent methods, such as embedding-based cosine similarity, are straightforward but can sometimes miss the subtleties of language and, most importantly, the interaction between documents and a query's intent.

This is where using a [re-ranker](https://zilliz.com/learn/optimize-rag-with-rerankers-the-role-and-tradeoffs) helps. A re-ranker is an advanced AI model that takes the initial set of results from a search—often provided by an embeddings/token-based search—and re-evaluates them to ensure they align more closely with the user's intent. It looks beyond the surface-level matching of terms to consider the deeper interaction between the search query and the content of the documents.

For this example, we’ll use the [Jina AI Reranker](https://milvus.io/docs/integrate_with_jina.md).



```
from pymilvus.model.reranker import JinaRerankFunction

jina_api_key = "<YOUR_JINA_API_KEY>"

rf = JinaRerankFunction("jina-reranker-v1-base-en", jina_api_key)

query = "What event in 1956 marked the official birth of artificial intelligence as a discipline?"

documents = [
   "In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.",
   "The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.",
   "In 1951, British mathematician and computer scientist Alan Turing also developed the first program designed to play chess, demonstrating an early example of AI in game strategy.",
   "The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems."
]

results = rf(query, documents)

for result in results:
   print(f"Index: {result.index}")
   print(f"Score: {result.score:.6f}")
   print(f"Text: {result.text}\n")
```

The expected output is similar to the following:

```
Index: 1
Score: 0.937096
Text: The Dartmouth Conference in 1956 is considered the birthplace of artificial intelligence as a field; here, John McCarthy and others coined the term 'artificial intelligence' and laid out its basic goals.

Index: 3
Score: 0.354210
Text: The invention of the Logic Theorist by Allen Newell, Herbert A. Simon, and Cliff Shaw in 1955 marked the creation of the first true AI program, which was capable of solving logic problems, akin to proving mathematical theorems.

Index: 0
Score: 0.349866
Text: In 1950, Alan Turing published his seminal paper, 'Computing Machinery and Intelligence,' proposing the Turing Test as a criterion of intelligence, a foundational concept in the philosophy and development of artificial intelligence.

Index: 2
Score: 0.272896
Text: In 1951, British mathematician and computer scientist Alan Turing also developed the first program designed to play chess, demonstrating an early example of AI in game strategy.
```

## Star Us On GitHub and Join Our Discord!

If you liked this blog post, consider starring Milvus on [GitHub](https://github.com/milvus-io/milvus), and feel free to join our [Discord](https://discord.gg/FG6hMJStWu)! 💙


404: Page Not Found

Or, explore something new:

Unveiling Milvus 2.4: Multi-vector Search, Sparse Vector, CAGRA Index, and More!

Introducing Milvus Lite: Start Building a GenAI Application in Seconds

Introducing PyMilvus Integration with Embedding Models

What is Milvus?

Start Building Your GenAI App

RAG

Image Search

Multimodal Search

Hybrid Search

Graph RAG