Milvus in 2023: An Unprecedented Vector Database Amidst Tech Buzz
This post was written by James Luan with the help of ChatGPT. James primarily wrote prompts and reviewed and polished the AI-generated content.
2023 marks a pivotal turning point in artificial intelligence (AI). Large Language Models (LLMs) have taken center stage, garnering widespread recognition for their exceptional natural language processing capabilities. This surge in popularity has substantially expanded the possibilities of machine learning applications, enabling developers to construct more intelligent and interactive applications.
Amidst this revolution, vector databases have emerged as a crucial component, acting as the long-term memory for LLMs. The rise of Retrieval-Augmented Generation (RAG) models, intelligent agents, and multimodal retrieval apps has demonstrated the vast potential of vector databases in enhancing multimodal data retrieval efficiency, reducing hallucinations in LLMs, and supplementing domain knowledge.
The LLM evolution has also catalyzed significant advancements in embedding technologies. According to the Massive Text Embedding Benchmark (MTEB) Leaderboard on HuggingFace, leading embedding models such as UAE, VoyageAI, CohereV3, and Bge were all released in 2023. These advancements have bolstered the vector retrieval effectiveness of various vector search technologies like Milvus, providing more precise and efficient data processing capabilities for AI applications.
However, with the growing popularity of vector databases, debates arose about the necessity of specialized solutions. Tens of startups have entered the vector database arena. Many traditional relational and NoSQL databases have started treating vectors as a significant data type, and many claim to be capable of substituting specialized vector databases in every situation.
As we enter 2024, it's an sensible moment to reflect on the entire vector database industry, with a special focus on Milvus—a standout product in this landscape.
First launched in 2019, Milvus has pioneered the concept of vector databases and consistently maintained a reputation for high reliability, scalability, search quality, and performance. In 2023, Milvus achieved impressive results and underwent significant shifts, primarily driven by the rapid advancement of LLMs and the boom of AIGC applications. Here are some key figures that best represent Milvus's progress in 2023.
For those new to vector databases, their primary focus centers on functionality rather than operational maintenance. Many application developers also pay less attention to stability in their vector databases than transactional databases since their applications are often in the early stages of exploration. However, stability becomes indispensable if you aim to deploy your AIGC application in a production environment and achieve the best user experience.
Milvus distinguishes itself by prioritizing not just functionality but also operational stability. We added rolling upgrades to Milvus starting from version 2.2.3. After continuous refinement, this feature can ensure zero downtime during upgrades without interrupting business processes.
Boosting vector search performance needs to be a primary goal for vector databases. Many vector search solutions chose to base their solution on adapting the HNSW algorithm to get to market quickly; unfortunately, this leads to them facing significant challenges in real-world production environments, especially with highly filtered searches (over 90%) and frequent data deletions. Milvus considers performance from the get-go and excels in optimizing performance during any phase of development, especially in production environments, achieving a threefold improvement in search performance, especially in filtered search and streaming insert/search situations.
To further assist the vector database community, we introduced VectorDBBench, an open-source benchmarking tool, last year. This tool is vital for early evaluations of vector databases across different conditions. Unlike traditional evaluation methods, VectorDBBench assesses databases using real-world data, including super large datasets or those closely resembling data from actual embedding models, providing users with more insightful information for informed decision-making.
While dense embeddings have proven effective in vector search, they must catch up when searching for names, objects, abbreviations, and short query contexts. In response to their limitations, Milvus has introduced a hybrid query approach that integrates dense embeddings with sparse embeddings to enhance the quality of search results. The synergy of this hybrid solution with a reranking model has resulted in a substantial 5% improvement in the recall rate on the Beir dataset, as validated by our tests.
Going beyond improvements in search quality, Milvus has also unveiled a graph-based retrieval solution tailored for sparse embeddings, surpassing the performance of conventional search algorithms like WAND.
At the 2023 NeurIPS BigANN competition, Zihao Wang, a talented engineer at Zilliz, presented Pyanns, a search algorithm that demonstrated significant superiority over other entries in the sparse embedding search track. This breakthrough solution is a precursor to our sparse embedding search algorithms for production environments.
Retrieval Augmented Generation (RAG) was the most popular use case for vector databases in 2023. However, the increase in vector data volumes with RAG applications presents a storage challenge for these applications. This challenge is especially true when the volume of transformed vectors exceeds that of the original document chunks, potentially escalating memory usage costs. For example, after dividing documents into chunks, the size of a 1536-dimensional float32 vector (roughly 3kb) transformed from a 500-token chunk (about 1kb) is greater than the 500-token chunk.
Milvus is the first open-source vector database to support disk-based indexing, bringing about a remarkable 5x memory saving. By the close of 2023, we introduced Milvus 2.3.4, enabling the capability to load scalar and vector data/indexes onto the disk using memory-mapped files (MMap). This advancement offers more than a 10x reduction in memory usage compared to traditional in-memory indexing.
In 2023, Milvus underwent a transformative journey marked by significant milestones. Over the year, we launched 20 releases, a testament to the dedication of over 300 community developers and the realization of our commitment to a user-driven approach in development.
To illustrate, Milvus 2.2.9 introduced dynamic schema, marking a crucial shift from prioritizing performance to enhancing usability. Building on this, Milvus 2.3 introduced critical features such as Upsert, Range Search, Cosine metrics, and more, all driven by our user community's specific needs and feedback. This iterative development process underscores our commitment to continually aligning Milvus with the evolving requirements of our users.
Implementing multi-tenancy is crucial for developing RAG systems, AI agents, and other LLM applications, meeting the heightened user demands for data isolation. For B2C businesses, tenant numbers can skyrocket into the millions, making physical isolation of user data impractical (as an example, it's unlikely that anyone would create millions of tables in a relational database). Milvus introduced the Partition Key feature, allowing for efficient, logical isolation and data filtering based on partition keys, which is handy at a large scale.
Conversely, B2B enterprises, accustomed to dealing with tens of thousands of tenants, benefit from a more nuanced strategy involving physical resource isolation. The latest Milvus 2.3.4 brings enhanced memory management, coroutine handling, and CPU optimization, making creating tens of thousands of tables within a single cluster easier. This enhancement also accommodates the needs of B2B businesses with enhanced efficiency and control.
As 2023 drew to a close, Milvus reached an impressive milestone with 10 million Docker pull downloads. This accomplishment signals the increasing fascination of the developer community with Milvus and emphasizes its rising significance within the vector database domain.
As the world's first cloud-native vector database, Milvus boasts seamless integration with Kubernetes and the broader container ecosystem. Gazing into the future, one can't help but ponder the next focal point in the ever-evolving vector database landscape. Could it be the rise of Serverless services?
While scalability might not currently steal the spotlight in the AI phenomenon, it certainly plays a pivotal role, far from being a mere sideshow. Milvus vector database can seamlessly scale out to accommodate billions of vector data without breaking a sweat. Take a look at one of our LLM customers, for example. Milvus effortlessly helped this customer store, process, and retrieve an astounding 10 billion data points. But how do you balance the cost and performance when dealing with such a massive volume of data? Rest assured, Mivus has various capabilities to help you address that challenge and elevate your experience.
Beyond the numerical milestones, 2023 has enriched us with valuable insights. We've delved into the intricacies of the vector database landscape, moving beyond mere statistics to grasp the subtle nuances and evolving dynamics of vector search technology.
Reflecting on the early days of the mobile internet boom, many developers created simple apps like flashlights or weather forecasts, which eventually were integrated into smartphone operating systems. Last year, most AI Native applications, like AutoGPT, which rapidly hit 100,000 stars on GitHub, didn't deliver practical value but only represented meaningful experiments. For vector database applications, the current use cases may just be the first wave of AI Native transformations, and I eagerly anticipate more killer use cases to emerge.
Similar to the evolution of databases into categories like OLTP, OLAP, and NoSQL, vector databases show a clear trend toward diversification. Departing from the conventional focus on online services, offline analysis has gained significant traction. Another notable instance of this shift is the introduction of GPTCache, an open-sourced semantic cache released in 2023. It enhances the efficiency and speed of GPT-based applications by storing and retrieving responses generated by language models.
We are hopeful and excited to witness even more diversified applications and system designs in vector databases in the coming year.
While supporting Approximate Nearest Neighbor (ANN) search is a defining feature of vector databases, it doesn't stand alone. The common belief that merely keeping Nearest Neighbour Search is sufficient to classify a database as a vector or AI native database oversimplifies the intricacies of vector operations. Beyond the basic capabilities of hybrid scalar filtering and vector search, databases tailored for AI native applications should support more sophisticated semantic capabilities like NN Filtering, KNN Join, and cluster querying.
The exponential growth of AI applications, exemplified by ChatGPT amassing over 100 million monthly active users in two months, surpasses any prior business trajectory. Swiftly scaling from 1 million to 1 billion data points becomes paramount once businesses hit their stride in growth. AI application developers benefit from the pay-as-you-go service model set by LLM providers, leading to substantial reductions in operational costs. Similarly, storing data that aligns with this pricing model proves advantageous for developers, allowing them to channel more attention toward core business.
Unlike Language Models (LLMs) and various other technological systems, vector databases operate in a stateful manner, demanding persistent data storage for their functionality. Consequently, when selecting vector databases, it is crucial to prioritize elasticity and scalability. This prioritization ensures alignment with the dynamic demands of evolving AI applications, highlighting the need for seamless adaptability to changing workloads.
In 2023, our substantial investment in the AI4DB (AI for Database) projects yielded remarkable success. As part of our endeavors, we introduced two pivotal capabilities to Zilliz Cloud, the fully managed Milvus solution: 1) AutoIndex, an auto-parameter-tuning index rooted in machine learning, and 2) a data partitioning strategy based on data clustering. Both innovations played a crucial role in significantly enhancing the search performance of Zilliz Cloud.
Closed-source LLMs like OpenAI's GPT series and Claude currently take the lead, placing the open-source community disadvantaged due to the absence of comparable computational and data resources.
However, within vector databases, open source will eventually become the favored choice for users. Opting for open source introduces many advantages, including more diverse use cases, expedited iteration, and cultivating a more robust ecosystem. Furthermore, database systems are so intricate that they cannot afford the opacity often associated with LLMs. Users must thoroughly understand the database before choosing the most reasonable approach for its utilization. Moreover, the transparency ingrained in open source empowers users to possess the liberty and the control to customize the database according to their needs.
As 2023 swiftly passes amidst transformative changes, the story of vector databases is just beginning. Our journey with the Milvus vector database is about something other than getting lost in the hype of AIGC. Instead, we focus on meticulously developing our product, identifying and nurturing application use cases that align with our strengths, and unwaveringly serving our users. Our commitment to open source aims to bridge the gap between us and our users, allowing them to sense our dedication and craftsmanship, even from a distance.
2023 also saw many AI startups being founded and getting their first funding rounds. It is exciting to see the innovation from these developers, and it reminds me of why I got into VectorDB development in the first place. 2024 will be a year for all these innovative applications to gain real traction, attracting not just funding but real paying customers. Customer revenue will bring different requirements for these developers, as building a fully scalable solution with little to no downtime is paramount.
Let's make extraordinary things happen in 2024!
Like the article? Spread the word