Schema
This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.
Field schema
A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and creating a collection.
Milvus supports only one primary key field in a collection.
Field schema properties
Properties | Description | Note |
---|---|---|
name |
Name of the field in the collection to create | Data type: String. Mandatory |
dtype |
Data type of the field | Mandatory |
description |
Description of the field | Data type: String. Optional |
is_primary |
Whether to set the field as the primary key field or not | Data type: Boolean (true or false ).Mandatory for the primary key field |
auto_id (Mandatory for primary key field) |
Switch to enable or disable automatic ID (primary key) allocation. | True or False |
max_length (Mandatory for VARCHAR field) |
Maximum length of strings allowed to be inserted. | [1, 65,535] |
dim |
Dimension of the vector | Data type: Integer ∈[1, 32768]. Mandatory for the vector field |
is_partition_key |
Whether this field is a partition-key field. | Data type: Boolean (true or false ). |
Create a field schema
To reduce the complexity in data inserts, Milvus allows you to specify a default value for each scalar field during field schema creation, excluding the primary key field. This indicates that if you leave a field empty when inserting data, the default value you specified for this field applies.
Create a regular field schema:
from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
# The following creates a field and use it as the partition key
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
Create a field schema with default field values:
from pymilvus import FieldSchema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
# configure default value `25` for field `age`
FieldSchema(name="age", dtype=DataType.INT64, default_value=25, description="age"),
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
]
Supported data types
DataType
defines the kind of data a field contains. Different fields support different data types.
- Primary key field supports:
- INT64: numpy.int64
- VARCHAR: VARCHAR
- Scalar field supports:
- Vector field supports:
- BINARY_VECTOR: Binary vector
- FLOAT_VECTOR: Float vector
JSON as a composite data type is available. A JSON field comprises key-value pairs. Each key is a string, and a value can be a number, string, boolean value, array, or list. For details, refer to JSON: a new data type
Collection schema
A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and creating a collection.
Collection schema properties
Properties | Description | Note |
---|---|---|
field |
Fields in the collection to create | Mandatory |
description |
Description of the collection | Data type: String. Optional |
partition_key_field |
Name of a field that is designed to act as the partition key. | Data type: String. Optional |
enable_dynamic_field |
Whether to enable dynamic schema or not | Data type: Boolean (true or false ).Optional, defaults to False .For details on dynamic schema, refer to Dynamic Schema and the user guides for managing collections. |
Create a collection schema
from pymilvus import FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
# Enable partition key on a field if you need to implement multi-tenancy based on the partition-key field
position_field = FieldSchema(name="position", dtype=DataType.VARCHAR, max_length=256, is_partition_key=True)
# Set enable_dynamic_field to True if you need to use dynamic fields.
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, enable_dynamic_field=True, description="desc of a collection")
Create a collection with the schema specified:
from pymilvus import Collection
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
- You can define the shard number with
shards_num
. - You can define the Milvus server on which you wish to create a collection by specifying the alias in
using
. - You can enable the partition key feature on a field by setting
is_partition_key
toTrue
on the field if you need to implement partition-key-based multi-tenancy. - You can enable dynamic schema by setting
enable_dynamic_field
toTrue
in the collection schema if you need to use dynamic fields.
You can also create a collection with Collection.construct_from_dataframe
, which automatically generates a collection schema from DataFrame and creates a collection.
import pandas as pd
df = pd.DataFrame({
"id": [i for i in range(nb)],
"age": [random.randint(20, 40) for i in range(nb)],
"embedding": [[random.random() for _ in range(dim)] for _ in range(nb)],
"position": "test_pos"
})
collection, ins_res = Collection.construct_from_dataframe(
'my_collection',
df,
primary_field='id',
auto_id=False
)
What’s next
- Learn how to prepare schema when creating a collection.
- Read more about dynamic schema.
- Read more about partition-key in Multi-tenancy.