milvus-logo
LFAI
Home
  • User Guide

Manage Schema

This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.

Field schema

A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and managing collections.

Milvus supports only one primary key field in a collection.

Field schema properties

Properties Description Note
name Name of the field in the collection to create Data type: String.
Mandatory
dtype Data type of the field Mandatory
description Description of the field Data type: String.
Optional
is_primary Whether to set the field as the primary key field or not Data type: Boolean (true or false).
Mandatory for the primary key field
auto_id (Mandatory for primary key field) Switch to enable or disable automatic ID (primary key) allocation. True or False
max_length (Mandatory for VARCHAR field) Maximum byte length for strings allowed to be inserted. Note that multibyte characters (e.g., Unicode characters) may occupy more than one byte each, so ensure the byte length of inserted strings does not exceed the specified limit. [1, 65,535]
dim Dimension of the vector Data type: Integer ∈[1, 32768].
Mandatory for a dense vector field. Omit for a sparse vector field.
is_partition_key Whether this field is a partition-key field. Data type: Boolean (true or false).

Create a field schema

To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.

Create a regular field schema:

from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

# The following creates a field and use it as the partition key
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)

Create a field schema with default field values:

from pymilvus import FieldSchema

fields = [
  FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
  # configure default value `25` for field `age`
  FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
  embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
]

Supported data types

DataType defines the kind of data a field contains. Different fields support different data types.

  • Primary key field supports:

    • INT64: numpy.int64
    • VARCHAR: VARCHAR
  • Scalar field supports:

    • BOOL: Boolean (true or false)
    • INT8: numpy.int8
    • INT16: numpy.int16
    • INT32: numpy.int32
    • INT64: numpy.int64
    • FLOAT: numpy.float32
    • DOUBLE: numpy.double
    • VARCHAR: VARCHAR
    • JSON: JSON
    • Array: Array

    JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type.

  • Vector field supports:

    • BINARY_VECTOR: Stores binary data as a sequence of 0s and 1s, used for compact feature representation in image processing and information retrieval.
    • FLOAT_VECTOR: Stores 32-bit floating-point numbers, commonly used in scientific computing and machine learning for representing real numbers.
    • FLOAT16_VECTOR: Stores 16-bit half-precision floating-point numbers, used in deep learning and GPU computations for memory and bandwidth efficiency.
    • BFLOAT16_VECTOR: Stores 16-bit floating-point numbers with reduced precision but the same exponent range as Float32, popular in deep learning for reducing memory and computational requirements without significantly impacting accuracy.
    • SPARSE_FLOAT_VECTOR: Stores a list of non-zero elements and their corresponding indices, used for representing sparse vectors. For more information, refer to Sparse Vectors.

    Milvus supports multiple vector fields in a collection. For more information, refer to Hybrid Search.

Collection schema

A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and managing collections.

Collection schema properties

Properties Description Note
field Fields in the collection to create Mandatory
description Description of the collection Data type: String.
Optional
partition_key_field Name of a field that is designed to act as the partition key. Data type: String.
Optional
enable_dynamic_field Whether to enable dynamic schema or not Data type: Boolean (true or false).
Optional, defaults to False.
For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections.

Create a collection schema

Define the field schemas before defining a collection schema.
from pymilvus import FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

# Enable partition key on a field if you need to implement multi-tenancy based on the partition-key field
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)

# Set enable_dynamic_field to True if you need to use dynamic fields. 
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")

Create a collection with the schema specified:

from pymilvus import Collection,connections
conn = connections.connect(host="127.0.0.1", port=19530)
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
  • You can define the shard number with shards_num.
  • You can define the Milvus server on which you wish to create a collection by specifying the alias in using.
  • You can enable the partition key feature on a field by setting is_partition_key to True on the field if you need to implement partition-key-based multi-tenancy.
  • You can enable dynamic schema by setting enable_dynamic_field to True in the collection schema if you need to enable dynamic field.


You can also create a collection with Collection.construct_from_dataframe, which automatically generates a collection schema from DataFrame and creates a collection.

import pandas as pd
df = pd.DataFrame({
    "id": [i for i in range(nb)],
    "age": [random.randint(20, 40) for i in range(nb)],
    "embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
    "position": "test_pos"
})

collection, ins_res = Collection.construct_from_dataframe(
    'my_collection',
    df,
    primary_field='id',
    auto_id=False
    )

What’s next

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?