milvus-logo

Schema

This topic introduces schema in Milvus. Schema is used to define the properties of a collection and the fields within.

Field schema

A field schema is the logical definition of a field. It is the first thing you need to define before defining a collection schema and creating a collection.

Milvus 2.0 supports only one primary key field in a collection.

Field schema properties

Properties Description Note
name Name of the field in the collection to create Data type: String.
Mandatory
dtype Data type of the field Mandatory
description Description of the field Data type: String.
Optional
is_primary Whether to set the field as the primary key field or not Data type: Boolean (true or false).
Mandatory for the primary key field
dim Dimension of the vector Data type: Integer ∈[1, 32768].
Mandatory for the vector field

Create a field schema

from pymilvus import FieldSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")

Supported data type

DataType defines the kind of data a field contains. Different fields support different data types.

  • Primary key field supports:
    • INT64: numpy.int64
  • Scalar field supports:
    • BOOL: Boolean (true or false)
    • INT8: numpy.int8
    • INT16: numpy.int16
    • INT32: numpy.int32
    • INT64: numpy.int64
    • FLOAT: numpy.float32
    • DOUBLE: numpy.double
  • Vector field supports:
    • BINARY_VECTOR: Binary vector
    • FLOAT_VECTOR: Float vector

Collection schema

A collection schema is the logical definition of a collection. Usually you need to define the field schema before defining a collection schema and creating a collection.

Collection schema properties

Properties Description Note
field Fields in the collection to create Mandatory
description Description of the collection Data type: String.
Optional
auto_id Whether to enable Automatic ID (primary key) allocation or not Data type: Boolean (true or false).
Optional

Create a collection schema

Define the field schemas before defining a collection schema.
from pymilvus import FieldSchema, CollectionSchema
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, description="primary id")
age_field = FieldSchema(name="age", dtype=DataType.INT64, description="age")
embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128, description="vector")
schema = CollectionSchema(fields=[id_field, age_field, embedding_field], auto_id=False, description="desc of a collection")

Create a collection with the schema specified:

from pymilvus import Collection
collection_name1 = "tutorial_1"
collection1 = Collection(name=collection_name1, schema=schema, using='default', shards_num=2)
You can define the shard number with shards_num and in which Milvus server you wish to create a collection by specifying the alias in using.

You can also create a collection with Collection.construct_from_dataframe, which automatically generates a collection schema from DataFrame and creates a collection.
import pandas as pd
df = pd.DataFrame({
        "id": [i for i in range(nb)],
        "age": [random.randint(20, 40) for i in range(nb)],
        "embedding": [[random.random() for _ in range(dim)] for _ in range(nb)]
    })
collection, ins_res = Collection.construct_from_dataframe(
                                'my_collection',
                                df,
                                primary_field='id',
                                auto_id=False
                                )
On this page