Manage Schema

This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.

Field schema

A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and managing collections.

Milvus supports only one primary key field in a collection.

Field schema properties

Properties	Description	Note
`name`	Name of the field in the collection to create	Data type: String. Mandatory
`dtype`	Data type of the field	Mandatory
`description`	Description of the field	Data type: String. Optional
`is_primary`	Whether to set the field as the primary key field or not	Data type: Boolean (`true` or `false`). Mandatory for the primary key field
`auto_id` (Mandatory for primary key field)	Switch to enable or disable automatic ID (primary key) allocation.	`True` or `False`
`max_length` (Mandatory for VARCHAR field)	Maximum byte length for strings allowed to be inserted. Note that multibyte characters (e.g., Unicode characters) may occupy more than one byte each, so ensure the byte length of inserted strings does not exceed the specified limit.	[1, 65,535]
`dim`	Dimension of the vector	Data type: Integer ∈[1, 32768]. Mandatory for a dense vector field. Omit for a sparse vector field.
`is_partition_key`	Whether this field is a partition-key field.	Data type: Boolean (`true` or `false`).

Create a field schema

To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.

Create a regular field schema:

from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

# The following creates a field and use it as the partition key
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)

Create a field schema with default field values:

from pymilvus import FieldSchema

fields = [
  FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
  # configure default value `25` for field `age`
  FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
  embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
]

Supported data types

DataType defines the kind of data a field contains. Different fields support different data types.

Primary key field supports:
- INT64: numpy.int64
- VARCHAR: VARCHAR
Scalar field supports:
- BOOL: Boolean (true or false)
- INT8: numpy.int8
- INT16: numpy.int16
- INT32: numpy.int32
- INT64: numpy.int64
- FLOAT: numpy.float32
- DOUBLE: numpy.double
- VARCHAR: VARCHAR
- JSON: JSON
- Array: Array
JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type.
Vector field supports:
- BINARY_VECTOR: Stores binary data as a sequence of 0s and 1s, used for compact feature representation in image processing and information retrieval.
- FLOAT_VECTOR: Stores 32-bit floating-point numbers, commonly used in scientific computing and machine learning for representing real numbers.
- FLOAT16_VECTOR: Stores 16-bit half-precision floating-point numbers, used in deep learning and GPU computations for memory and bandwidth efficiency.
- BFLOAT16_VECTOR: Stores 16-bit floating-point numbers with reduced precision but the same exponent range as Float32, popular in deep learning for reducing memory and computational requirements without significantly impacting accuracy.
- SPARSE_FLOAT_VECTOR: Stores a list of non-zero elements and their corresponding indices, used for representing sparse vectors. For more information, refer to Sparse Vectors.
Milvus supports multiple vector fields in a collection. For more information, refer to Hybrid Search.

Collection schema

A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and managing collections.

Collection schema properties

Properties	Description	Note
`field`	Fields in the collection to create	Mandatory
`description`	Description of the collection	Data type: String. Optional
`partition_key_field`	Name of a field that is designed to act as the partition key.	Data type: String. Optional
`enable_dynamic_field`	Whether to enable dynamic schema or not	Data type: Boolean (`true` or `false`). Optional, defaults to `False`. For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections.

Create a collection schema

Define the field schemas before defining a collection schema.

from pymilvus import FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

# Enable partition key on a field if you need to implement multi-tenancy based on the partition-key field
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)

# Set enable_dynamic_field to True if you need to use dynamic fields. 
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")

Create a collection with the schema specified:

from pymilvus import Collection,connections
conn = connections.connect(host="127.0.0.1", port=19530)
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)

You can define the shard number with shards_num.
You can define the Milvus server on which you wish to create a collection by specifying the alias in using.
You can enable the partition key feature on a field by setting is_partition_key to True on the field if you need to implement partition-key-based multi-tenancy.
You can enable dynamic schema by setting enable_dynamic_field to True in the collection schema if you need to enable dynamic field.

You can also create a collection with Collection.construct_from_dataframe, which automatically generates a collection schema from DataFrame and creates a collection.

import pandas as pd
df = pd.DataFrame({
    "id": [i for i in range(nb)],
    "age": [random.randint(20, 40) for i in range(nb)],
    "embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
    "position": "test_pos"
})

collection, ins_res = Collection.construct_from_dataframe(
    'my_collection',
    df,
    primary_field='id',
    auto_id=False
    )

What’s next

Learn how to prepare schema when managing collections.
Read more about dynamic schema.
Read more about partition-key in Multi-tenancy.

Manage Schema
Field schema
Collection schema
What's next

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?