milvus-logo
LFAI
Home
  • User Guide

Create a Collection

This topic describes how to create a collection in Milvus.

A collection consists of one or more partitions. While creating a new collection, Milvus creates a default partition _default. See Glossary - Collection for more information.

The following example builds a two-shard collection named book, with a primary key field named book_id, an INT64 scalar field named word_count, and a two-dimensional floating-point vector field named book_intro. Real applications will likely use much higher dimensional vectors than the example.

When interacting with Milvus using Python code, you have the flexibility to choose between PyMilvus and MilvusClient (new). For more information, refer to Python SDK.

Prepare Schema

The collection to create must contain a primary key field and a vector field. INT64 and VarChar are supported data type on primary key field.

First, prepare necessary parameters, including the field schema, collection schema, and collection name.

Before defining a collection schema, create a schema for each field in the collection. To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field, excluding a primary key field. This indicates that if you leave a field empty when inserting data, the default value you configured for this field during field schema creation will be used.

from pymilvus import CollectionSchema, FieldSchema, DataType
book_id = FieldSchema(
  name="book_id",
  dtype=DataType.INT64,
  is_primary=True,
)
book_name = FieldSchema(
  name="book_name",
  dtype=DataType.VARCHAR,
  max_length=200,
  # The default value will be used if this field is left empty during data inserts or upserts.
  # The data type of `default_value` must be the same as that specified in `dtype`.
  default_value="Unknown"
)
word_count = FieldSchema(
  name="word_count",
  dtype=DataType.INT64,
  # The default value will be used if this field is left empty during data inserts or upserts.
  # The data type of `default_value` must be the same as that specified in `dtype`.
  default_value=9999
)
book_intro = FieldSchema(
  name="book_intro",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
schema = CollectionSchema(
  fields=[book_id, book_name, word_count, book_intro],
  description="Test book search",
  enable_dynamic_field=True
)
collection_name = "book"
import { DataType } from "@zilliz/milvus2-sdk-node";
const params = {
  collection_name: "book",
  description: "Test book search",
  fields: [
    {
      name: "book_intro",
      description: "",
      data_type: DataType.FloatVector,
      dim: 2,
    },
    {
      name: "book_id",
      data_type: DataType.Int64,
      is_primary_key: true,
      description: "",
    },
    {
      name: "book_name",
      data_type: DataType.VarChar,
      max_length: 256,
      description: "",
    },
    {
      name: "word_count",
      data_type: DataType.Int64,
      description: "",
    },
  ],
  enableDynamicField: true
};
var (
    collectionName = "book"
)
schema := &entity.Schema{
  CollectionName: collectionName,
  Description:    "Test book search",
  Fields: []*entity.Field{
    {
      Name:       "book_id",
      DataType:   entity.FieldTypeInt64,
      PrimaryKey: true,
      AutoID:     false,
    },
    {
      Name:       "word_count",
      DataType:   entity.FieldTypeInt64,
      PrimaryKey: false,
      AutoID:     false,
    },
    {
      Name:     "book_intro",
      DataType: entity.FieldTypeFloatVector,
      TypeParams: map[string]string{
          "dim": "2",
      },
    },
  },
  EnableDynamicField: true,
}
FieldType fieldType1 = FieldType.newBuilder()
        .withName("book_id")
        .withDataType(DataType.Int64)
        .withPrimaryKey(true)
        .withAutoID(false)
        .build();
FieldType fieldType2 = FieldType.newBuilder()
        .withName("word_count")
        .withDataType(DataType.Int64)
        .build();
FieldType fieldType3 = FieldType.newBuilder()
        .withName("book_intro")
        .withDataType(DataType.FloatVector)
        .withDimension(2)
        .build();
CreateCollectionParam createCollectionReq = CreateCollectionParam.newBuilder()
        .withCollectionName("book")
        .withDescription("Test book search")
        .withShardsNum(2)
        .addFieldType(fieldType1)
        .addFieldType(fieldType2)
        .addFieldType(fieldType3)
        .withEnableDynamicField(true)
        .build();
create collection -c book -f book_id:INT64:book_id -f word_count:INT64:word_count -f book_intro:FLOAT_VECTOR:2 -p book_id
curl -X 'POST' \
  '${MILVUS_HOST}:${MILVUS_PORT}/v1/vector/collections/create' \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
       "dbName": "default",   
       "collectionName": "medium_articles",
       "dimension": 256,
       "metricType": "L2",
       "primaryField": "id",
       "vectorField": "vector"
      }'
Output:
{
    "code": 200,
    "data": {}
}
Schema Type Parameter Description Option
FieldSchema name Name of the field to create. N/A
dtype Data type of the field to create. For primary key field:
  • DataType.INT64 (numpy.int64)
  • DataType.VARCHAR (VARCHAR)
For scalar field:
  • DataType.BOOL (Boolean)
  • DataType.INT8 (numpy.int8)
  • DataType.INT16 (numpy.int16)
  • DataType.INT32 (numpy.int32)
  • DataType.INT64 (numpy.int64)
  • DataType.FLOAT (numpy.float32)
  • DataType.DOUBLE (numpy.double)
  • DataType.VARCHAR (VARCHAR)
  • DataType.JSON (JSON)
  • DataType.ARRAY
For vector field:
  • BINARY_VECTOR (Binary vector)
  • FLOAT_VECTOR (Float vector)
element_type (Mandatory for ARRAY field) Data type of array elements to create. The data type of all elements in an array field must be the same. Valid values:
  • DataType.Int8
  • DataType.Int16
  • DataType.Int32
  • DataType.Int64
  • DataType.VARCHAR
  • DataType.BOOL
  • DataType.FLOAT
  • DataType.DOUBLE
is_primary Switch to control if the field is the primary key field. This parameter is mandatory for the primary key field. True or False
auto_id Switch to enable or disable automatic ID (primary key) allocation. This parameter is mandatory for the primary key field and defaults to False True or False
max_length (Mandatory for VARCHAR field) Maximum length of strings allowed to be inserted. [1, 65,535]
max_capacity (Mandatory for ARRAY field) Maximum number of elements allowed for an array field. [1, 4,096]
default_value Default value of the field. This parameter is available only for non-array and non-JSON scalar fields. You cannot specify a default value for a primary key field. Refer to Parameter default_value for more information. N/A
dim (Mandatory for vector field) Dimension of the vector. [1, 32,768]
description (Optional) Description of the field. N/A
CollectionSchema fields Fields of the collection to create. N/A
description (Optional) Description of the collection to create. N/A
enable_dynamic_field Whether to enable dynamic schema or not Data type: Boolean (true or false).
Optional, defaults to False.
For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections.
collection_name Name of the collection to create. N/A
Type Parameter Description Option
entity.Schema CollectionName Name of the collection to create. N/A
Description Description of the collection to create. N/A
AutoID Switch to enable or disable Automatic ID (primary key) allocation. True or False
Fields Schema of the fields within the collection to create. Refer to Schema for more information. N/A
EnableDynamicField Whether to enable dynamic schema or not. For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections. N/A
entity.Field Name Name of the field to create. N/A
PrimaryKey Whether this field is the primary key. This is mandatory for the primary key. N/A
AutoID Whether this field value automatically increments. This is mandatory for the primary key. N/A
Description Description of the field. N/A
DataType Data type of the field to create. For primary key field:
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeVarChar (VARCHAR)
For scalar field:
  • entity.FieldTypeBool (Boolean)
  • entity.FieldTypeInt8 (numpy.int8)
  • entity.FieldTypeInt16 (numpy.int16)
  • entity.FieldTypeInt32 (numpy.int32)
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeFloat (numpy.float32)
  • entity.FieldTypeDouble (numpy.double)
  • entity.FieldTypeVarChar (VARCHAR)
For vector field:
  • entity.FieldTypeBinaryVector (Binary vector)
  • entity.FieldTypeFloatVector (Float vector)
TypeParams A string mapping to set parameters for a specific data type. N/A
IndexParams A string mapping to set parameters for the index of the collection. N/A
IsDynamic Whether dynamic schema is enabled on this field. N/A
IsPartitionKey Whether this field acts as the partition key. N/A
Interface Parameter Description Option
CreateCollectionReq collection_name Name of the collection to create. N/A
shards_num Number of shards to create along with the collection. [1, 16]
description Description of the collection to create. N/A
consistency_level Consistency level of the collection. For details, refer to [Consistency Level](consistency.md) Possible values are as follows:
  • Strong
  • Session
  • Bounded
  • Eventually
  • Customized
fields Schema of the field and the collection to create. Refer to Schema for more information.
num_partitions Number of parititions to create along within the collection. [1, 4096]
partition_key_field Name of the field that is designed to act as the partiion key. For details, refer to [Use Partition Key](partition_key.md) N/A
enable_dynamic_field | enableDynamicField Whether to enable dynamic schema for this collection. N/A
FieldType name Name of the field. N/A
description Description of the field. N/A
data_type | DataType Data type of the filed to create. Refer to data type reference number for more information.
is_primary_key Switch to control if the field is primary key field. This is mandatory for the parimary key. true or false
is_partition_key Switch to control if the field acts as the partition key. true or false
is_dynamic Switch to control if the field is a dynamic field. true or false
autoID Switch to enable or disable Automatic ID (primary key) allocation. true or false
dim Dimension of the vector. This is mandatory for a vector field. [1, 32768]
max_length Dimension of the vector. This is mandatory for a string field. [1, 32768]
default_value (Optional) Default value that applies if not specified. N/A
Class Parameter Description Option
CreateCollectionSchema.newBuilder() withCollectionName(String collectionName) Name of the collection to create. N/A
withDatabaseName(String databaseName) Name of the database in which the collection is to create. N/A
withShardsNum(int shardsNum) Number of shards to create along with the collection. [1, 16]
withEnableDynamicField(boolean enableDynamicField) Whether to enable dynamic field for this collection. N/A
withDescription(boolean description) Description of this collection. N/A
withFieldTypes(List fieldType) Fields in this collection N/A
withConsistencyLevel(ConsistencyLevelEnum consistencyLevel) Description of this collection. Possible values are as follows:
  • STRONG
  • BOUNDED
  • EVENTUALLY
withPartitionsNum(int partitionsNum) Number of partitions to create in this collection. [1, 4096]
FieldType.newBuilder() withName(String name) Name of the field to create. N/A
withIsDynamic(boolean isDynamic) Whether this field is a dynamic field. N/A
withPrimaryKey(boolean primaryKey) Whether this field is the primary key. N/A
withDescription(String description) Description of this field. N/A
withDataType(DataType datatype) Data type of the field to create. For primary key field:
  • DataType.Int64 (numpy.int64)
  • DataType.VarChar (VARCHAR)
For scalar field:
  • DataType.Bool (Boolean)
  • DataType.Int8 (numpy.int8)
  • DataType.Int16 (numpy.int16)
  • DataType.Int32 (numpy.int32)
  • DataType.Int64 (numpy.int64)
  • DataType.Float (numpy.float32)
  • DataType.Double (numpy.double)
  • DataType.VarChar (VARCHAR)
For vector field:
  • DataType.BinaryVector (Binary vector)
  • DataType.FloatVector (Float vector)
withAutoID(boolean autoId) Switch to enable or disable Automatic ID (primary key) allocation. True or False
withDimension(int dimension) (Mandatory for vector field) Dimension of the vector. [1, 32768]
Parameter Description Option
dbName The name of the database to which the collection to create belongs to. N/A
collectionName (Required) The name of the collection to create. N/A
dimension (Required) The number of dimensions for the vector field of the collection.
The value ranges from 32 to 32768.
N/A
metricType The distance metric used for the collection.
The value defaults to L2.
N/A
primaryField The name of the primary key field.
The value defaults to id.
N/A
vectorField(field) The name of the vector field.
The value defaults to vector.
N/A

Create a collection with the schema

Then, create a collection with the schema you specified above.

from pymilvus import Collection
collection = Collection(
    name=collection_name,
    schema=schema,
    using='default',
    shards_num=2
    )
await milvusClient.createCollection(params);
err = milvusClient.CreateCollection(
    context.Background(), // ctx
    schema,
    2, // shardNum
)
if err != nil {
    log.Fatal("failed to create collection:", err.Error())
}
milvusClient.createCollection(createCollectionReq);
Parameter Description Option
using (optional) By specifying the server alias here, you can choose in which Milvus server you create a collection. N/A
shards_num (optional) Number of the shards for the collection to create. [1,16]
num_partitions (optional) Number of logical partitions for the collection to create. [1,4096]
*kwargs: collection.ttl.seconds (optional) Collection time to live (TTL) is the expiration time of a collection. Data in an expired collection will be cleaned up and will not be involved in searches or queries. Specify TTL in the unit of seconds. The value should be 0 or greater. 0 means TTL is disabled.
Parameter Description Option
ctx Context to control API invocation process. N/A
shardNum Number of the shards for the collection to create. [1,16]

Limits

Resource configuration

FeatureMaximum limit
Length of a collection name255 characters
Number of partitions in a collection4,096
Number of fields in a collection64
Number of shards in a collection16

Parameter default_value

  • default_value is available only for non-array and non-JSON scalar fields.
  • default_value does not apply to the primary key.
  • The data type of default_value must be the same as that specified in dtype. Otherwise, an error can occur.
  • In the case of using auto_id, it's not allowed to set all the remaining fields to use default values. That is, when performing insert or upsert operations, you need to specify values for at least one field. Otherwise, an error can occur.

What's next

Feedback

Was this page helpful?