In a cluster deployment of Milvus, services are provided by a group of nodes to achieve high availability and easy scalability.
A collection in Milvus is equivalent to a table in a relational database management system (RDBMS). In Milvus, collections are used to store and manage entities.
An entity consists of a group of fields that represent real world objects. Each entity in Milvus is represented by a unique row ID.
Fields are the units that make up entities. Fields can be structured data (e.g., numbers, strings) or vectors.
Normalization refers to the process of converting an embedding (vector) so that its norm equals one. If inner product (IP) is used to calculate embeddings similarities, all embeddings must be normalized. After normalization, inner product equals cosine similarity.
A partition is a division of a collection. Milvus supports dividing collection data into multiple parts on physical storage. This process is called partitioning, and each partition can contain multiple segments.
A segment is a data file automatically created by Milvus for holding inserted data. A collection can have multiple segments and a segment can have multiple entities. During vector similarity search, Milvus scans each segment and returns the search results.
Sharding refers to distributing write operations to different nodes to make the most of the parallel computing potential of a Milvus cluster for writing data. By default, a single collection contains two shards. Milvus adopts a sharding method based on primary key hashing. Milvus' development roadmap includes supporting more flexible sharding methods such as random and custom sharding.
In a standalone deployment of Milvus, all operations including data insertion, index building, and vector similarity search are completed in one single process.
A vector represents the features of unstructured data. It is usually converted by an AI or ML model. A vector comes in the form of a numeric array of high dimensions. Each vector represents an object.
Each entity can only contain one vector in the current version of Milvus.
A vector index is a reorganized data structure derived from raw data that can greatly accelerates the process of vector similarity search. Milvus supports several vector index types.
Learn how to select the ideal index for your application scenario.