Manage Collections
This guide walks you through creating and managing collections using the SDK of your choice.
Before you start
-
You have installed Milvus standalone or Milvus cluster.
-
You have installed preferred SDKs. You can choose among various languages, including Python, Java, Go, and Node.js.
Overview
In Milvus, you store your vector embeddings in collections. All vector embeddings within a collection share the same dimensionality and distance metric for measuring similarity.
Milvus collections support dynamic fields (i.e., fields not pre-defined in the schema) and automatic incrementation of primary keys.
To accommodate different preferences, Milvus offers two methods for creating a collection. One provides a quick setup, while the other allows for detailed customization of the collection schema and index parameters.
Additionally, you can view, load, release, and drop a collection when necessary.
The code snippets on this page use new MilvusClient (Python) to interact with Milvus. New MilvusClient SDKs for other languages will be released in future updates.
Create Collection
You can create a collection in either of the following manners:
-
Quick setup
In this manner, you can create a collection by simply giving it a name and specifying the number of dimensions of the vector embeddings to be stored in this collection. For details, refer to Quick setup.
-
Customized setup
Instead of letting In Milvus decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own. For details, refer to Customized setup.
Quick setup
Against the backdrop of the great leap in the AI industry, most developers just need a simple yet dynamic collection to start with. Milvus allows a quick setup of such a collection with just three arguments:
-
Name of the collection to create,
-
Dimension of the vector embeddings to insert, and
-
Metric type used to measure similarities between vector embeddings.
from pymilvus import MilvusClient, DataType
# 1. Set up a Milvus client
client = MilvusClient(
uri="http://localhost:19530"
)
# 2. Create a collection in quick setup mode
client.create_collection(
collection_name="quick_setup",
dimension=5
)
res = client.get_load_state(
collection_name="quick_setup"
)
print(res)
# Output
#
# {
# "state": "<LoadState: Loaded>"
# }
The collection generated in the above code contains only two fields: id
(as the primary key) and vector
(as the vector field), with auto_id
and enable_dynamic_field
settings enabled by default.
-
auto_id
Enabling this setting ensures that the primary key increments automatically. There's no need for manual provision of primary keys during data insertion.
-
enable_dynamic_field
When enabled, all fields, excluding
id
andvector
in the data to be inserted, are treated as dynamic fields. These additional fields are saved as key-value pairs within a special field named$meta
. This feature allows the inclusion of extra fields during data insertion.
The automatically indexed and loaded collection from the provided code is ready for immediate data insertions.
Customized setup
Instead of letting Milvus decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own.
Step 1: Set up schema
A schema defines the structure of a collection. Within the schema, you have the option to enable or disable enable_dynamic_field
, add pre-defined fields, and set attributes for each field. For a detailed explanation of the concept and available data types, refer to Schema Explained.
# 3. Create a collection in customized setup mode
# 3.1. Create schema
schema = MilvusClient.create_schema(
auto_id=False,
enable_dynamic_field=True,
)
# 3.2. Add fields to schema
schema.add_field(field_name="my_id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="my_vector", datatype=DataType.FLOAT_VECTOR, dim=5)
In the provided code snippet for Python, the enable_dynamic_field
is set to True
, and auto_id
is enabled for the primary key. Additionally, a vector
field is introduced, configured with a dimensionality of 768, along with the inclusion of four scalar fields, each with its respective attributes.
Step 2: Set up index parameters
Index parameters dictate how Milvus organizes your data within a collection. You can tailor the indexing process for specific fields by adjusting their metric_type
and index_type
. For the vector field, you have the flexibility to select COSINE
, L2
, or IP
as the metric_type
.
# 3.3. Prepare index parameters
index_params = client.prepare_index_params()
# 3.4. Add indexes
index_params.add_index(
field_name="my_id",
index_type="STL_SORT"
)
index_params.add_index(
field_name="my_vector",
index_type="IVF_FLAT",
metric_type="IP",
params={ "nlist": 128 }
)
The code snippet above demonstrates how to set up index parameters for the vector field and a scalar field, respectively. For the vector field, set both the metric type and the index type. For a scalar field, set only the index type. It is recommended to create an index for the vector field and any scalar fields that are frequently used for filtering.
Step 3: Create the collection
You have the option to create a collection and an index file separately or to create a collection with the index loaded simultaneously upon creation.
-
Create a collection with the index loaded simultaneously upon creation.
# 3.5. Create a collection with the index loaded simultaneously client.create_collection( collection_name="customized_setup_1", schema=schema, index_params=index_params ) time.sleep(5) res = client.get_load_state( collection_name="customized_setup_1" ) print(res) # Output # # { # "state": "<LoadState: Loaded>" # }
The collection created above is loaded automatically. To learn more about loading and releasing a collection, refer to Load & Release Collection.
-
Create a collection and an index file separately.
# 3.6. Create a collection and index it separately client.create_collection( collection_name="customized_setup_2", schema=schema, ) res = client.get_load_state( collection_name="customized_setup_2" ) print(res) # Output # # { # "state": "<LoadState: NotLoad>" # }
The collection created above is not loaded automatically. You can create an index for the collection as follows. Creating an index for the collection in a separate manner does not automatically load the collection. For details, refer to Load & Release Collection.
# 3.6 Create index client.create_index( collection_name="customized_setup_2", index_params=index_params ) res = client.get_load_state( collection_name="customized_setup_2" ) print(res) # Output # # { # "state": "<LoadState: NotLoad>" # }
View Collections
You can check the details of an existing collection as follows:
# 5. View Collections
res = client.describe_collection(
collection_name="customized_setup_2"
)
print(res)
# Output
#
# {
# "collection_name": "customized_setup_2",
# "auto_id": false,
# "num_shards": 1,
# "description": "",
# "fields": [
# {
# "field_id": 100,
# "name": "my_id",
# "description": "",
# "type": 5,
# "params": {},
# "element_type": 0,
# "is_primary": true
# },
# {
# "field_id": 101,
# "name": "my_vector",
# "description": "",
# "type": 101,
# "params": {
# "dim": 5
# },
# "element_type": 0
# }
# ],
# "aliases": [],
# "collection_id": 448143479230158446,
# "consistency_level": 2,
# "properties": {},
# "num_partitions": 1,
# "enable_dynamic_field": true
# }
To list all existing collections, you can do as follows:
# 6. List all collection names
res = client.list_collections()
print(res)
# Output
#
# [
# "customized_setup_2",
# "quick_setup",
# "customized_setup_1"
# ]
Load & Release Collection
During the loading process of a collection, Milvus loads the collection's index file into memory. Conversely, when releasing a collection, Milvus unloads the index file from memory. Before conducting searches in a collection, ensure that the collection is loaded.
Load a collection
To load a collection, use the load_collection()
method, specifying the collection name. You can also set replica_number
to determine how many in-memory replicas of data segments to create on query nodes when the collection is loaded.
- Milvus Standalone: The maximum allowed value for
replica_number
is 1. - Milvus Cluster: The maximum value should not exceed the
queryNode.replicas
set in your Milvus configurations. For additional details, refer to Query Node-related Configurations.
# 7. Load the collection
client.load_collection(
collection_name="customized_setup_2",
replica_number=1 # Number of replicas to create on query nodes. Max value is 1 for Milvus Standalone, and no greater than `queryNode.replicas` for Milvus Cluster.
)
res = client.get_load_state(
collection_name="customized_setup_2"
)
print(res)
# Output
#
# {
# "state": "<LoadState: Loaded>"
# }
Release a collection
# 8. Release the collection
client.release_collection(
collection_name="customized_setup_2"
)
res = client.get_load_state(
collection_name="customized_setup_2"
)
print(res)
# Output
#
# {
# "state": "<LoadState: NotLoad>"
# }
Set up aliases
You can assign aliases for collections to make them more meaningful in a specific context. You can assign multiple aliases for a collection, but multiple collections cannot share an alias.
Create aliases
# 9.1. Create aliases
client.create_alias(
collection_name="customized_setup_2",
alias="bob"
)
client.create_alias(
collection_name="customized_setup_2",
alias="alice"
)
List aliases
# 9.2. List aliases
res = client.list_aliases(
collection_name="customized_setup_2"
)
print(res)
# Output
#
# {
# "aliases": [
# "bob",
# "alice"
# ],
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }
Describe aliases
# 9.3. Describe aliases
res = client.describe_alias(
alias="bob"
)
print(res)
# Output
#
# {
# "alias": "bob",
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }
Reassign aliases
# 9.4 Reassign aliases to other collections
client.alter_alias(
collection_name="customized_setup_1",
alias="alice"
)
res = client.list_aliases(
collection_name="customized_setup_1"
)
print(res)
# Output
#
# {
# "aliases": [
# "alice"
# ],
# "collection_name": "customized_setup_1",
# "db_name": "default"
# }
res = client.list_aliases(
collection_name="customized_setup_2"
)
print(res)
# Output
#
# {
# "aliases": [
# "bob"
# ],
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }
Drop aliases
# 9.5 Drop aliases
client.drop_alias(
alias="bob"
)
client.drop_alias(
alias="alice"
)
Drop a Collection
If a collection is no longer needed, you can drop the collection.
# 10. Drop the collections
client.drop_collection(
collection_name="quick_setup"
)
client.drop_collection(
collection_name="customized_setup_1"
)
client.drop_collection(
collection_name="customized_setup_2"
)