You have installed preferred SDKs. You can choose among various languages, including Python, Java, Go, and Node.js.
Overview
In Milvus, you store your vector embeddings in collections. All vector embeddings within a collection share the same dimensionality and distance metric for measuring similarity.
Milvus collections support dynamic fields (i.e., fields not pre-defined in the schema) and automatic incrementation of primary keys.
To accommodate different preferences, Milvus offers two methods for creating a collection. One provides a quick setup, while the other allows for detailed customization of the collection schema and index parameters.
Additionally, you can view, load, release, and drop a collection when necessary.
Create Collection
You can create a collection in either of the following manners:
Quick setup
In this manner, you can create a collection by simply giving it a name and specifying the number of dimensions of the vector embeddings to be stored in this collection. For details, refer to Quick setup.
Customized setup
Instead of letting In Milvus decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own. For details, refer to Customized setup.
Quick setup
Against the backdrop of the great leap in the AI industry, most developers just need a simple yet dynamic collection to start with. Milvus allows a quick setup of such a collection with just three arguments:
Name of the collection to create,
Dimension of the vector embeddings to insert, and
Metric type used to measure similarities between vector embeddings.
For quick setup, use the create_collection() method of the MilvusClient class to create a collection with the specified name and dimension.
For quick setup, use the createCollection() method of the MilvusClientV2 class to create a collection with the specified name and dimension.
For quick setup, use the createCollection() method of the MilvusClient class to create a collection with the specified name and dimension.
For quick setup, use the CreateCollection() on an instance of the Client interface using NewClient() method, to create a collection with the specified name and dimension.
The collection generated in the above code contains only two fields: id (as the primary key) and vector (as the vector field), with auto_id and enable_dynamic_field settings enabled by default.
auto_id
Enabling this setting ensures that the primary key increments automatically. There’s no need for manual provision of primary keys during data insertion.
enable_dynamic_field
When enabled, all fields, excluding id and vector in the data to be inserted, are treated as dynamic fields. These additional fields are saved as key-value pairs within a special field named $meta. This feature allows the inclusion of extra fields during data insertion.
The automatically indexed and loaded collection from the provided code is ready for immediate data insertions.
Customized setup
Instead of letting Milvus decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own.
Step 1: Set up schema
A schema defines the structure of a collection. Within the schema, you have the option to enable or disable enable_dynamic_field, add pre-defined fields, and set attributes for each field. For a detailed explanation of the concept and available data types, refer to Schema Explained.
To set up a schema, use create_schema() to create a schema object and add_field() to add fields to the schema.
To set up a schema, use createSchema() to create a schema object and addField() to add fields to the schema.
To set up a schema, use entity.NewSchema() to create a schema object and schema.WithField() to add fields to the schema.
To set up a schema, you need to define a JSON object that follows the schema format as displayed on the POST /v2/vectordb/collections/create API endpoint reference page.
Determines if the primary field automatically increments. Setting this to True makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. The auto-generated IDs have a fixed length and cannot be altered.
enable_dynamic_field
Determines if Milvus saves the values of undefined fields in a dynamic field if the data being inserted into the target collection includes fields that are not defined in the collection's schema. When you set this to True, Milvus will create a field called $meta to store any undefined fields and their values from the data that is inserted.
field_name
The name of the field.
datatype
The data type of the field. For a list of available data types, refer to DataType.
is_primary
Whether the current field is the primary field in a collection. Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type.
dim
The dimension of the vector embeddings. This is mandatory for a field of the DataType.FLOAT_VECTOR, DataType.BINARY_VECTOR, DataType.FLOAT16_VECTOR, or DataType.BFLOAT16_VECTOR type. If you use DataType.SPARSE_FLOAT_VECTOR, omit this parameter.
Parameter
Description
fieldName
The name of the field.
dataType
The data type of the field. For a list of available data types, refer to DataType.
isPrimaryKey
Whether the current field is the primary field in a collection. Each collection has only one primary field. A primary field should be of either the DataType.Int64 type or the DataType.VarChar type.
autoID
Whether allows the primary field to automatically increment. Setting this to true makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors.
dimension
The dimension of the vector embeddings. This is mandatory for a field of the DataType.FloatVector, DataType.BinaryVector, DataType.Float16Vector, or DataType.BFloat16Vector type.
Parameter
Description
name
The name of the field.
data_type
The data type of the field. For an enumeration of all available data types, refer to DataType.
is_primary_key
Whether the current field is the primary field in a collection. Each collection has only one primary field. A primary field should be of either the DataType.INT64 type or the DataType.VARCHAR type.
auto_id
Whether the primary field automatically increments upon data insertions into this collection. The value defaults to False. Setting this to True makes the primary field automatically increment. Skip this parameter if you need to set up a collection with a customized schema.
dim
The dimensionality of the collection field that holds vector embeddings. The value should be an integer greater than 1 and is usually determined by the model you use to generate vector embeddings.
Parameter
Description
WithName()
The name of the field.
WithDataType()
The data type of the field.
WithIsPrimaryKey()
Whether the current field is the primary field in a collection. Each collection has only one primary field. A primary field should be of either the entity.FieldTypeInt64 type or the entity.FieldTypeVarChar type.
WithIsAutoID()
Whether the primary field automatically increments upon data insertions into this collection. The value defaults to false. Setting this to true makes the primary field automatically increment. Skip this parameter if you need to set up a collection with a customized schema.
WithDim()
The dimensionality of the collection field that holds vector embeddings. The value should be an integer greater than 1 and is usually determined by the model you use to generate vector embeddings.
Parameter
Description
fieldName
The name of the field to create in the target collection.
dataType
The data type of the field values.
isPrimary
Whether the current field is the primary field. Setting this to True makes the current field the primary field.
elementTypeParams
Extra field parameters.
dim
An optional parameter for FloatVector or BinaryVector fields that determines the vector dimension.
Step 2: Set up index parameters
Index parameters dictate how Milvus organizes your data within a collection. You can tailor the indexing process for specific fields by adjusting their metric_type and index_type. For the vector field, you have the flexibility to select COSINE, L2, IP, HAMMING, or JACCARD as the metric_type, depending on the type of vectors you are working with. For more information, refer to Similarity Metrics.
To set up index parameters, you need to define a JSON object that follows the index parameters format as displayed on the POST /v2/vectordb/collections/create API endpoint reference page.
The name of the target file to apply this object applies.
index_type
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
metric_type
The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
params
The fine-tuning parameters for the specified index type. For details on possible keys and value ranges, refer to In-memory Index.
Parameter
Description
fieldName
The name of the target field to apply this IndexParam object applies.
indexType
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
metricType
The distance metric to use for the index. Possible values are IP, L2, COSINE, JACCARD, HAMMING.
The name of the target field on which an index is to be created.
index_type
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
metric_type
The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
params
The fine-tuning parameters for the specified index type. For details on possible keys and value ranges, refer to In-memory Index.
Parameter
Description
index_type
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
metric_type
The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
nlist
Number of cluster units. Cluster units are used in IVF (Inverted File) based indexes in Milvus. For IVF_FLAT, the index divides vector data into `nlist` cluster units, and then compares distances between the target input vector and the center of each cluster1. Must be between 1 and 65536.
Parameter
Description
fieldName
The name of the target field on which an index is to be created.
indexName
The name of the index to create. The value defaults to the target field name.
metricType
The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
params
The index type and related settings. For details, refer to In-memory Index.
params.index_type
The type of the index to create.
params.nlist
The number of cluster units. This applies to IVF-related index types.
The code snippet above demonstrates how to set up index parameters for the vector field and a scalar field, respectively. For the vector field, set both the metric type and the index type. For a scalar field, set only the index type. It is recommended to create an index for the vector field and any scalar fields that are frequently used for filtering.
Step 3: Create the collection
You have the option to create a collection and an index file separately or to create a collection with the index loaded simultaneously upon creation.
Use create_collection() to create a collection with the specified schema and index parameters and get_load_state() to check the load state of the collection.
Use createCollection() to create a collection with the specified schema and index parameters and getLoadState() to check the load state of the collection.
Use createCollection() to create a collection with the specified schema and index parameters and getLoadState() to check the load state of the collection.
Create a collection with the index loaded simultaneously upon creation.
# 3.5. Create a collection with the index loaded simultaneously
client.create_collection(
collection_name="customized_setup_1",
schema=schema,
index_params=index_params
)
time.sleep(5)
res = client.get_load_state(
collection_name="customized_setup_1"
)
print(res)
# Output## {# "state": "<LoadState: Loaded>"# }
import io.milvus.v2.service.collection.request.CreateCollectionReq;
import io.milvus.v2.service.collection.request.GetLoadStateReq;
// 3.4 Create a collection with schema and index parametersCreateCollectionReqcustomizedSetupReq1= CreateCollectionReq.builder()
.collectionName("customized_setup_1")
.collectionSchema(schema)
.indexParams(indexParams)
.build();
client.createCollection(customizedSetupReq1);
// Thread.sleep(5000);// 3.5 Get load state of the collectionGetLoadStateReqcustomSetupLoadStateReq1= GetLoadStateReq.builder()
.collectionName("customized_setup_1")
.build();
res = client.getLoadState(customSetupLoadStateReq1);
System.out.println(res);
// Output:// true
// 3.3 Create a collection with fields and index parameters
res = await client.createCollection({
collection_name: "customized_setup_1",
fields: fields,
index_params: index_params,
})
console.log(res.error_code)
// Output// // Success//
res = await client.getLoadState({
collection_name: "customized_setup_1"
})
console.log(res.state)
// Output// // LoadStateLoaded//
# 3.6. Create a collection and index it separately
client.create_collection(
collection_name="customized_setup_2",
schema=schema,
)
res = client.get_load_state(
collection_name="customized_setup_2"
)
print(res)
# Output## {# "state": "<LoadState: NotLoad>"# }
// 3.6 Create a collection and index it separatelyCreateCollectionReqcustomizedSetupReq2= CreateCollectionReq.builder()
.collectionName("customized_setup_2")
.collectionSchema(schema)
.build();
client.createCollection(customizedSetupReq2);
// 3.4 Create a collection and index it seperately
res = await client.createCollection({
collection_name: "customized_setup_2",
fields: fields,
})
console.log(res.error_code)
// Output// // Success//
res = await client.getLoadState({
collection_name: "customized_setup_2"
})
console.log(res.state)
// Output// // LoadStateNotLoad//
// 3.4 Create a collection and index it seperately
schema.CollectionName = "customized_setup_2"
client.CreateCollection(ctx, schema, entity.DefaultShardNumber)
stateLoad, err := client.GetLoadState(context.Background(), "customized_setup_2", []string{})
if err != nil {
log.Fatal("failed to get load state:", err.Error())
}
fmt.Println(stateLoad)
// Output// 1// LoadStateNotExist -> LoadState = 0// LoadStateNotLoad -> LoadState = 1// LoadStateLoading -> LoadState = 2// LoadStateLoaded -> LoadState = 3
The collection created above is not loaded automatically. You can create an index for the collection as follows. Creating an index for the collection in a separate manner does not automatically load the collection. For details, refer to Load & Release Collection.
Parameter
Description
collection_name
The name of the collection.
schema
The schema of this collection. Setting this to None indicates this collection will be created with default settings. To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here. In this case, Milvus ignores all other schema-related settings carried in the request.
index_params
The parameters for building the index on the vector field in this collection. To set up a collection with a customized schema and automatically load the collection to memory, you need to create an IndexParams object and reference it here. You should at least add an index for the vector field in this collection. You can also skip this parameter if you prefer to set up the index parameters later on.
Parameter
Description
collectionName
The name of the collection.
collectionSchema
The schema of this collection. Leaving it empty indicates this collection will be created with default settings. To set up a collection with a customized schema, you need to create a CollectionSchema object and reference it here.
indexParams
The parameters for building the index on the vector field in this collection. To set up a collection with a customized schema and automatically load the collection to memory, create an IndexParams object with a list of IndexParam objects and reference it here.
Parameter
Description
collection_name
The name of the collection.
fields
The fields in the collection.
index_params
The index parameters for the collection to create.
Parameter
Description
schema.CollectionName
The name of the collection.
schema
The schema of this collection.
index_params
The index parameters for the collection to create.
Parameter
Description
collectionName
The name of the collection.
schema
The schema is responsible for organizing data in the target collection. A valid schema should have multiple fields, which must include a primary key, a vector field, and several scalar fields.
schema.autoID
Whether allows the primary field to automatically increment. Setting this to True makes the primary field automatically increment. In this case, the primary field should not be included in the data to insert to avoid errors. Set this parameter in the field with is_primary set to True.
schema.enableDynamicField
Whether allows to use the reserved $meta field to hold non-schema-defined fields in key-value pairs.
fields
A list of field objects.
fields.fieldName
The name of the field to create in the target collection.
fields.dataType
The data type of the field values.
fields.isPrimary
Whether the current field is the primary field. Setting this to True makes the current field the primary field.
fields.elementTypeParams
Extra field parameters.
fields.elementTypeParams.dim
An optional parameter for FloatVector or BinaryVector fields that determines the vector dimension.
The collection created above is not loaded automatically. You can create an index for the collection as follows. Creating an index for the collection in a separate manner does not automatically load the collection. For details, refer to Load & Release Collection.
CreateIndexReqcreateIndexReq= CreateIndexReq.builder()
.collectionName("customized_setup_2")
.indexParams(indexParams)
.build();
client.createIndex(createIndexReq);
// Thread.sleep(1000);// 3.7 Get load state of the collectionGetLoadStateReqcustomSetupLoadStateReq2= GetLoadStateReq.builder()
.collectionName("customized_setup_2")
.build();
res = client.getLoadState(customSetupLoadStateReq2);
System.out.println(res);
// Output:// false
An IndexParams object containing a list of IndexParam objects.
Parameter
Description
collectionName
The name of the collection.
indexParams
A list of IndexParam objects.
Parameter
Description
collection_name
The name of the collection.
field_name
The name of the field in which to create an index.
index_type
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
metric_type
The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
params
The fine-tuning parameters for the specified index type. For details on possible keys and value ranges, refer to In-memory Index.
Parameter
Description
collName
The name of the collection.
fieldName
The name of the field in which to create an index.
idx
The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
async
Whether this operation is asynchronous.
opts
The fine-tuning parameters for the specified index type. You can include multiple `entity.IndexOption` in this request. For details on possible keys and value ranges, refer to In-memory Index.
Parameter
Description
collectionName
The name of the collection.
indexParams
The index parameters for the collection to create.
indexParams.metricType
The similarity metric type used to build the index. The value defaults to COSINE.
indexParams.fieldName
The name of the target field on which an index is to be created.
indexParams.indexName
The name of the index to create, the value defaults to the target field name.
indexParams.indexConfig.index_type
The type of the index to create.
indexParams.indexConfig.nlist
The number of cluster units. This applies to IVF-related index types.
# 6. List all collection names
res = client.list_collections()
print(res)
# Output## [# "customized_setup_2",# "quick_setup",# "customized_setup_1"# ]
import io.milvus.v2.service.collection.response.ListCollectionsResp;
// 5. List all collection namesListCollectionsResplistCollectionsRes= client.listCollections();
System.out.println(listCollectionsRes.getCollectionNames());
// Output:// [// "customized_setup_2",// "quick_setup",// "customized_setup_1"// ]
// 5. List all collection namesListCollectionsResplistCollectionsRes= client.listCollections();
System.out.println(listCollectionsRes.getCollectionNames());
// Output:// [// "customized_setup_1",// "quick_setup",// "customized_setup_2"// ]
// 5. List all collection names
collections, err := client.ListCollections(ctx)
if err != nil {
log.Fatal("failed to list collection:", err.Error())
}
for _, c := range collections {
log.Println(c.Name)
}
// Output:// customized_setup_2// quick_setup
During the loading process of a collection, Milvus loads the collection’s index file into memory. Conversely, when releasing a collection, Milvus unloads the index file from memory. Before conducting searches in a collection, ensure that the collection is loaded.
Load a collection
To load a collection, use the load_collection() method, specifying the collection name. You can also set replica_number to determine how many in-memory replicas of data segments to create on query nodes when the collection is loaded.
Milvus Standalone: The maximum allowed value for replica_number is 1.
Milvus Cluster: The maximum value should not exceed the queryNode.replicas set in your Milvus configurations. For additional details, refer to Query Node-related Configurations.
To load a collection, use the loadCollection() method, specifying the collection name.
To load a collection, use the loadCollection() method, specifying the collection name.
To load a collection, use the LoadCollection() method, specifying the collection name.
# 7. Load the collection
client.load_collection(
collection_name="customized_setup_2",
replica_number=1# Number of replicas to create on query nodes. Max value is 1 for Milvus Standalone, and no greater than `queryNode.replicas` for Milvus Cluster.
)
res = client.get_load_state(
collection_name="customized_setup_2"
)
print(res)
# Output## {# "state": "<LoadState: Loaded>"# }
import io.milvus.v2.service.collection.request.LoadCollectionReq;
// 6. Load the collectionLoadCollectionReqloadCollectionReq= LoadCollectionReq.builder()
.collectionName("customized_setup_2")
.build();
client.loadCollection(loadCollectionReq);
// Thread.sleep(5000);// 7. Get load state of the collectionGetLoadStateReqloadStateReq= GetLoadStateReq.builder()
.collectionName("customized_setup_2")
.build();
res = client.getLoadState(loadStateReq);
System.out.println(res);
// Output:// true
// 7. Load the collection
res = await client.loadCollection({
collection_name: "customized_setup_2"
})
console.log(res.error_code)
// Output// // Success// awaitsleep(3000)
res = await client.getLoadState({
collection_name: "customized_setup_2"
})
console.log(res.state)
// Output// // LoadStateLoaded//
// 6. Load the collection
err = client.LoadCollection(ctx, "customized_setup_2", false)
if err != nil {
log.Fatal("failed to laod collection:", err.Error())
}
// 7. Get load state of the collection
stateLoad, err := client.GetLoadState(context.Background(), "customized_setup_2", []string{})
if err != nil {
log.Fatal("failed to get load state:", err.Error())
}
fmt.Println(stateLoad)
// Output:// 3// LoadStateNotExist -> LoadState = 0// LoadStateNotLoad -> LoadState = 1// LoadStateLoading -> LoadState = 2// LoadStateLoaded -> LoadState = 3
This feature is currently in public preview. The API and functionality may change in the future.
Upon receiving your load request, Milvus loads all vector field indexes and all scalar field data into memory. If some fields are not to be involved in searches and queries, you can exclude them from loading to reduce memory usage, improving search performance.
# 7. Load the collection
client.load_collection(
collection_name="customized_setup_2",
load_fields=["my_id", "my_vector"], # Load only the specified fields
skip_load_dynamic_field=True# Skip loading the dynamic field
)
res = client.get_load_state(
collection_name="customized_setup_2"
)
print(res)
# Output## {# "state": "<LoadState: Loaded>"# }
Note that only the fields listed in load_fields can be used as filtering conditions and output fields in searches and queries. You should always include the primary key in the list. The field names excluded from loading will not be available for filtering or output.
You can use skip_load_dynamic_field=True to skip loading the dynamic field. Milvus treats the dynamic field as a single field, so all the keys in the dynamic field will be included or excluded together.
Release a collection
To release a collection, use the release_collection() method, specifying the collection name.
To release a collection, use the releaseCollection() method, specifying the collection name.
To release a collection, use the releaseCollection() method, specifying the collection name.
To release a collection, use the ReleaseCollection() method, specifying the collection name.
You can assign aliases for collections to make them more meaningful in a specific context. You can assign multiple aliases for a collection, but multiple collections cannot share an alias.
Create aliases
To create aliases, use the create_alias() method, specifying the collection name and the alias.
To create aliases, use the createAlias() method, specifying the collection name and the alias.
To create aliases, use the createAlias() method, specifying the collection name and the alias.
You can set properties for a collection, such as ttl.seconds and mmap.enabled. For more information, refer to set_properties().
The code snippets in this section use the PyMilvus ORM module to interact with Milvus. Code snippets with the new MilvusClient SDK will be available soon.
Set TTL
Set the Time-To-Live (TTL) for the data in the collection, which specifies how long the data should be retained before it is automatically deleted.
from pymilvus import Collection, connections
# Connect to Milvus server
connections.connect(host="localhost", port="19530") # Change to your Milvus server IP and port# Get existing collection
collection = Collection("quick_setup")
# Set the TTL for the data in the collection
collection.set_properties(
properties={
"collection.ttl.seconds": 60
}
)
Set MMAP
Configure the memory mapping (MMAP) property for the collection, which determines whether data is mapped into memory to improve query performance. For more information, refer to Configure memory mapping
.
Before setting the MMAP property, release the collection first. Otherwise, an error will occur.
from pymilvus import Collection, connections
# Connect to Milvus server
connections.connect(host="localhost", port="19530") # Change to your Milvus server IP and port# Get existing collection
collection = Collection("quick_setup")
# Before setting memory mapping property, we need to release the collection first.
collection.release()
# Set memory mapping property to True or Flase
collection.set_properties(
properties={
"mmap.enabled": True
}
)
Drop a Collection
If a collection is no longer needed, you can drop the collection.
To drop a collection, use the drop_collection() method, specifying the collection name.
To drop a collection, use the dropCollection() method, specifying the collection name.
To drop a collection, use the dropCollection() method, specifying the collection name.
To drop a collection, use the DropCollection() method, specifying the collection name.
// 10. Drop the collection
res = await client.dropCollection({
collection_name: "customized_setup_2"
})
console.log(res.error_code)
// Output// // Success//
res = await client.dropCollection({
collection_name: "customized_setup_1"
})
console.log(res.error_code)
// Output// // Success//
res = await client.dropCollection({
collection_name: "quick_setup"
})
console.log(res.error_code)
// Output// // Success//
// 10. Drop collections
err = client.DropCollection(ctx, "quick_setup")
if err != nil {
log.Fatal("failed to drop collection:", err.Error())
}
err = client.DropCollection(ctx, "customized_setup_2")
if err != nil {
log.Fatal("failed to drop collection:", err.Error())
}