milvus-logo

Create a Collection

This topic describes how to create a collection in Milvus.

A collection consists of one or more partitions. While creating a new collection, Milvus creates a default partition _default. See Glossary - Collection for more information.

The following example builds a two-shard collection named book, with a primary key field named book_id, an INT64 scalar field named word_count, and a two-dimensional floating-point vector field named book_intro. Real applications will likely use much higher dimensional vectors than the example.

Prepare Schema

The collection to create must contain a primary key field and a vector field. INT64 and VarChar are supported data type on primary key field.

First, prepare necessary parameters, including field schema, collection schema, and collection name.

from pymilvus import CollectionSchema, FieldSchema, DataType
book_id = FieldSchema(
  name="book_id",
  dtype=DataType.INT64,
  is_primary=True,
)
book_name = FieldSchema(
  name="book_name",
  dtype=DataType.VARCHAR,
  max_length=200,
)
word_count = FieldSchema(
  name="word_count",
  dtype=DataType.INT64,
)
book_intro = FieldSchema(
  name="book_intro",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
schema = CollectionSchema(
  fields=[book_id, book_name, word_count, book_intro],
  description="Test book search",
  enable_dynamic_field=True
)
collection_name = "book"
import { DataType } from "@zilliz/milvus2-sdk-node";
const params = {
  collection_name: "book",
  description: "Test book search",
  fields: [
    {
      name: "book_intro",
      description: "",
      data_type: DataType.FloatVector,
      dim: 2,
    },
    {
      name: "book_id",
      data_type: DataType.Int64,
      is_primary_key: true,
      description: "",
    },
    {
      name: "book_name",
      data_type: DataType.VarChar,
      max_length: 256,
      description: "",
    },
    {
      name: "word_count",
      data_type: DataType.Int64,
      description: "",
    },
  ],
  enableDynamicField: true
};
var (
    collectionName = "book"
    )
schema := &entity.Schema{
  CollectionName: collectionName,
  Description:    "Test book search",
  Fields: []*entity.Field{
    {
      Name:       "book_id",
      DataType:   entity.FieldTypeInt64,
      PrimaryKey: true,
      AutoID:     false,
    },
    {
      Name:       "word_count",
      DataType:   entity.FieldTypeInt64,
      PrimaryKey: false,
      AutoID:     false,
    },
    {
      Name:     "book_intro",
      DataType: entity.FieldTypeFloatVector,
      TypeParams: map[string]string{
          "dim": "2",
      },
    },
  },
  EnableDynamicField: true
}
FieldType fieldType1 = FieldType.newBuilder()
        .withName("book_id")
        .withDataType(DataType.Int64)
        .withPrimaryKey(true)
        .withAutoID(false)
        .build();
FieldType fieldType2 = FieldType.newBuilder()
        .withName("word_count")
        .withDataType(DataType.Int64)
        .build();
FieldType fieldType3 = FieldType.newBuilder()
        .withName("book_intro")
        .withDataType(DataType.FloatVector)
        .withDimension(2)
        .build();
CreateCollectionParam createCollectionReq = CreateCollectionParam.newBuilder()
        .withCollectionName("book")
        .withDescription("Test book search")
        .withShardsNum(2)
        .addFieldType(fieldType1)
        .addFieldType(fieldType2)
        .addFieldType(fieldType3)
        .withEnableDynamicField(true)
        .build();
curl -X 'POST' \
  '${MILVUS_HOST}:${MILVUS_PORT}/v1/vector/collections/create' \
  -H 'Authorization: Bearer ${TOKEN}' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
       "dbName": "default",   
       "collectionName": "medium_articles",
       "dimension": 256,
       "metricType": "L2",
       "primaryField": "id",
       "vectorField": "vector"
      }'

Output:

{
    "code": 200,
    "data": {}
}
var schema = new CollectionSchema
{
    Fields =
    {
        FieldSchema.Create<long>("book_id", isPrimaryKey: true),
        FieldSchema.CreateVarchar("book_name", maxLength: 200),
        FieldSchema.Create<long>("word_count"),
        FieldSchema.CreateFloatVector("book_intro", dimension: 2)
    },
    Description = "Test book search",
    EnableDynamicFields = true
};
Parameter Description Option
FieldSchema Schema of the fields within the collection to create. Refer to Schema for more information. N/A
name Name of the field to create. N/A
dtype Data type of the field to create. For primary key field:
  • DataType.INT64 (numpy.int64)
  • DataType.VARCHAR (VARCHAR)
For scalar field:
  • DataType.BOOL (Boolean)
  • DataType.INT8 (numpy.int8)
  • DataType.INT16 (numpy.int16)
  • DataType.INT32 (numpy.int32)
  • DataType.INT64 (numpy.int64)
  • DataType.FLOAT (numpy.float32)
  • DataType.DOUBLE (numpy.double)
  • DataType.VARCHAR (VARCHAR)
  • DataType.JSON (JSON)
For vector field:
  • BINARY_VECTOR (Binary vector)
  • FLOAT_VECTOR (Float vector)
is_primary (Mandatory for primary key field) Switch to control if the field is primary key field. True or False
auto_id (Mandatory for primary key field) Switch to enable or disable automatic ID (primary key) allocation. True or False
max_length (Mandatory for VARCHAR field) Maximum length of strings allowed to be inserted. [1, 65,535]
dim (Mandatory for vector field) Dimension of the vector. [1, 32,768]
description (Optional) Description of the field. N/A
CollectionSchema Schema of the collection to create. Refer to Schema for more information. N/A
fields Fields of the collection to create. N/A
description (Optional) Description of the collection to create. N/A
enable_dynamic_field Whether to enable dynamic schema or not Data type: Boolean (true or false).
Optional, defaults to False.
For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections.
collection_name Name of the collection to create. N/A
Parameter Description Option
collectionName Name of the collection to create. N/A
description Description of the collection to create. N/A
Fields Schema of the fields within the collection to create. Refer to Schema for more information. N/A
Name Name of the field to create. N/A
DataType Data type of the field to create. For primary key field:
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeVarChar (VARCHAR)
For scalar field:
  • entity.FieldTypeBool (Boolean)
  • entity.FieldTypeInt8 (numpy.int8)
  • entity.FieldTypeInt16 (numpy.int16)
  • entity.FieldTypeInt32 (numpy.int32)
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeFloat (numpy.float32)
  • entity.FieldTypeDouble (numpy.double)
  • entity.FieldTypeVarChar (VARCHAR)
For vector field:
  • entity.FieldTypeBinaryVector (Binary vector)
  • entity.FieldTypeFloatVector (Float vector)
PrimaryKey (Mandatory for primary key field) Switch to control if the field is primary key field. True or False
AutoID (Mandatory for primary key field) Switch to enable or disable Automatic ID (primary key) allocation. True or False
dim (Mandatory for vector field) Dimension of the vector. [1, 32768]
Parameter Description Option
collection_name Name of the collection to create. N/A
description Description of the collection to create. N/A
fields Schema of the field and the collection to create. Refer to Schema for more information.
data_type Data type of the filed to create. Refer to data type reference number for more information.
is_primary_key (Mandatory for primary key field) Switch to control if the field is primary key field. true or false
autoID Switch to enable or disable Automatic ID (primary key) allocation. true or false
dim (Mandatory for vector field) Dimension of the vector. [1, 32768]
max_length (Mandatory for VarChar field) Dimension of the vector. [1, 32768]
description (Optional) Description of the field. N/A
Parameter Description Option
Name Name of the field to create. N/A
Description Description of the field to create. N/A
DataType Data type of the field to create. For primary key field:
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeVarChar (VARCHAR)
For scalar field:
  • entity.FieldTypeBool (Boolean)
  • entity.FieldTypeInt8 (numpy.int8)
  • entity.FieldTypeInt16 (numpy.int16)
  • entity.FieldTypeInt32 (numpy.int32)
  • entity.FieldTypeInt64 (numpy.int64)
  • entity.FieldTypeFloat (numpy.float32)
  • entity.FieldTypeDouble (numpy.double)
  • entity.FieldTypeVarChar (VARCHAR)
For vector field:
  • entity.FieldTypeBinaryVector (Binary vector)
  • entity.FieldTypeFloatVector (Float vector)
PrimaryKey (Mandatory for primary key field) Switch to control if the field is primary key field. True or False
AutoID Switch to enable or disable Automatic ID (primary key) allocation. True or False
Dimension (Mandatory for vector field) Dimension of the vector. [1, 32768]
CollectionName Name of the collection to create. N/A
Description (Optional) Description of the collection to create. N/A
ShardsNum Number of the shards for the collection to create. [1,16]
PartitionsNum Number of the logical partitions for the collection to create. [1,4096]
Parameter Description Option
dbName The name of the database to which the collection to create belongs to. N/A
collectionName (Required) The name of the collection to create. N/A
dimension (Required) The number of dimensions for the vector field of the collection.
The value ranges from 32 to 32768.
N/A
metricType The distance metric used for the collection.
The value defaults to L2.
N/A
primaryField The name of the primary key field.
The value defaults to id.
N/A
vectorField(field) The name of the vector field.
The value defaults to vector.
N/A
Class Description Option
CollectionSchema The logical definition of a collection, describing the fields which makes it up. Possible parameters:
  • Fields: A list of Fields derived from the FieldSchema class.
  • Description: (Optional) Description of the collection schema.
  • EnableDynamicFields: (Optional) Whether to enable dynamic fields.
FieldSchema The logical definition of a collection, describing the fields which makes it up. Possible methods:
  • Create(string name, MilvusDataType dataType, bool isPrimaryKey, bool autoId, bool isPartitionKey, string description)
  • Create(string name, bool isPrimaryKey, bool autoId, bool isPartitionKey, string description)
  • CreateVarchar(string name, int maxLength, bool isPrimaryKey, bool autoId, bool isPartitionKey, string description)
  • CreateFloatVector(string name, int dimension, string description)
  • CreateBinaryVector(string name, int dimension, string description)
  • CreateJson(string name)
## Create a collection with the schema

Then, create a collection with the schema you specified above.

from pymilvus import Collection
collection = Collection(
    name=collection_name,
    schema=schema,
    using='default',
    shards_num=2
    )
await milvusClient.createCollection(param);
err = milvusClient.CreateCollection(
    context.Background(), // ctx
    schema,
    2, // shardNum
)
if err != nil {
    log.Fatal("failed to create collection:", err.Error())
}
milvusClient.createCollection(createCollectionReq);
var collection = await milvusClient.CreateCollectionAsync(collectionName, schema, shardsNum: 2);
curl --request POST \
     --url "${MILVUS_HOST}:${MILVUS_PORT}/v1/vector/collections/create" \
     --header "Authorization: Bearer ${TOKEN}" \
     --header "accept: application/json" \
     --header "content-type: application/json" \
     -d '{
       "dbName": "default",   
       "collectionName": "medium_articles",
       "dimension": 256,
       "metricType": "L2",
       "primaryField": "id",
       "vectorField": "vector"
      }'
Parameter Description Option
using (optional) By specifying the server alias here, you can choose in which Milvus server you create a collection. N/A
shards_num (optional) Number of the shards for the collection to create. [1,16]
num_partitions (optional) Number of logical partitions for the collection to create. [1,4096]
*kwargs: collection.ttl.seconds (optional) Collection time to live (TTL) is the expiration time of a collection. Data in an expired collection will be cleaned up and will not be involved in searches or queries. Specify TTL in the unit of seconds. The value should be 0 or greater. 0 means TTL is disabled.
Parameter Description Option
ctx Context to control API invocation process. N/A
shardNum Number of the shards for the collection to create. [1,16]
Parameter Description Option
collectionName Name of the collection. N/A
schema Schema of the collection. Should be a CollectionSchema object. N/A
consistencyLevel Consistency Level of the collection. Possible values are
  • ConsistencyLevel.Strong
  • ConsistencyLevel.Session
  • ConsistencyLevel.BoundedStaleness
  • ConsistencyLevel.Eventually
  • ConsistencyLevel.Customized
shardsNum Number of shards to create. N/A

Limits

Feature Maximum limit
Length of a collection name 255 characters
Number of partitions in a collection 4,096
Number of fields in a collection 64
Number of shards in a collection 16

What's next

On this page