Primary Field & AutoID
Every collection in Milvus must have a primary field to uniquely identify each entity. This field ensures that every entity can be inserted, updated, queried, or deleted without ambiguity.
Depending on your use case, you can either let Milvus automatically generate IDs (AutoID) or assign your own IDs manually.
What is a primary field?
A primary field acts as the unique key for each entity in a collection, similar to a primary key in a traditional database. Milvus uses the primary field to manage entities during insert, upsert, delete, and query operations.
Key requirements:
Each collection must have exactly one primary field.
Primary field values cannot be null.
The data type must be specified at creation and cannot be changed later.
Supported data types
The primary field must use a supported scalar data type that can uniquely identify entities.
Data Type |
Description |
---|---|
|
64-bit integer type, commonly used with AutoID. This is the recommended option for most use cases. |
|
Variable-length string type. Use this when entity identifiers come from external systems (for example, product codes or user IDs). Requires the |
Choose between AutoID and Manual IDs
Milvus supports two modes for assigning primary key values.
Mode |
Description |
Recommended For |
---|---|---|
AutoID |
Milvus automatically generates unique identifiers for inserted or imported entities. |
Most scenarios where you don’t need to manage IDs manually. |
Manual ID |
You provide unique IDs yourself when inserting or importing data. |
When IDs must align with external systems or pre-existing datasets. |
If you are unsure which mode to choose, start with AutoID for simpler ingestion and guaranteed uniqueness.
Quickstart: Use AutoID
You can let Milvus handle ID generation automatically.
Step 1: Create a collection with AutoID
Enable auto_id=True
in your primary field definition. Milvus will handle ID generation automatically.
from pymilvus import MilvusClient, DataType
client = MilvusClient(uri="http://localhost:19530")
schema = client.create_schema()
# Define primary field with AutoID enabled
schema.add_field(
field_name="id", # Primary field name
is_primary=True,
auto_id=True, # Milvus generates IDs automatically; Defaults to False
datatype=DataType.INT64
)
# Define the other fields
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=4) # Vector field
schema.add_field(field_name="category", datatype=DataType.VARCHAR, max_length=1000) # Scalar field of the VARCHAR type
# Create the collection
if client.has_collection("demo_autoid"):
client.drop_collection("demo_autoid")
client.create_collection(collection_name="demo_autoid", schema=schema)
// java
import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";
const client = new MilvusClient({
address: "localhost:19530",
});
// Define schema fields
const schema = [
{
name: "id",
description: "Primary field",
data_type: DataType.Int64,
is_primary_key: true,
autoID: true, // Milvus generates IDs automatically
},
{
name: "embedding",
description: "Vector field",
data_type: DataType.FloatVector,
dim: 4,
},
{
name: "category",
description: "Scalar field",
data_type: DataType.VarChar,
max_length: 1000,
},
];
// Create the collection
await client.createCollection({
collection_name: "demo_autoid",
fields: schema,
});
// go
# restful
Step 2: Insert Data
Important: Do not include the primary field column in your data. Milvus generates IDs automatically.
data = [
{"embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
]
res = client.insert(collection_name="demo_autoid", data=data)
print("Generated IDs:", res.get("ids"))
# Output example:
# Generated IDs: [461526052788333649, 461526052788333650]
// java
const data = [
{"embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
];
const res = await client.insert({
collection_name: "demo_autoid",
fields_data: data,
});
console.log(res);
// go
# restful
Use upsert()
instead of insert()
when working with existing entities to avoid duplicate ID errors.
Use manual IDs
If you need to control IDs manually, disable AutoID and provide your own values.
Step 1: Create a collection without AutoID
from pymilvus import MilvusClient, DataType
client = MilvusClient(uri="http://localhost:19530")
schema = client.create_schema()
# Define the primary field without AutoID
schema.add_field(
field_name="product_id",
is_primary=True,
auto_id=False, # You'll provide IDs manually at data ingestion
datatype=DataType.VARCHAR,
max_length=100 # Required when datatype is VARCHAR
)
# Define the other fields
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=4) # Vector field
schema.add_field(field_name="category", datatype=DataType.VARCHAR, max_length=1000) # Scalar field of the VARCHAR type
# Create the collection
if client.has_collection("demo_manual_ids"):
client.drop_collection("demo_manual_ids")
client.create_collection(collection_name="demo_manual_ids", schema=schema)
// java
import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";
const client = new MilvusClient({
address: "localhost:19530",
username: "username",
password: "Aa12345!!",
});
const schema = [
{
name: "product_id",
data_type: DataType.VARCHAR,
is_primary_key: true,
autoID: false,
},
{
name: "embedding",
data_type: DataType.FLOAT_VECTOR,
dim: 4,
},
{
name: "category",
data_type: DataType.VARCHAR,
max_length: 1000,
},
];
const res = await client.createCollection({
collection_name: "demo_autoid",
schema: schema,
});
// go
# restful
Step 2: Insert data with your IDs
You must include the primary field column in every insert operation.
# Each entity must contain the primary field `product_id`
data = [
{"product_id": "PROD-001", "embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"product_id": "PROD-002", "embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
]
res = client.insert(collection_name="demo_manual_ids", data=data)
print("Generated IDs:", res.get("ids"))
# Output example:
# Generated IDs: ['PROD-001', 'PROD-002']
// java
const data = [
{"product_id": "PROD-001", "embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"product_id": "PROD-002", "embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
];
const insert = await client.insert({
collection_name: "demo_autoid",
fields_data: data,
});
console.log(insert);
// go
# restful
Your responsibilities:
Ensure all IDs are unique across all entities
Include the primary field in every insert/import operation
Handle ID conflicts and duplicate detection yourself
Advanced usage
Migrate data with existing AutoIDs
To preserve existing IDs during data migration, enable the allow_insert_auto_id
property by making the alter_collection_properties
call. When set to true, Milvus accepts user-provided IDs even if AutoID is enabled.
For configuration details, refer to Modify Collection.
Ensure global AutoID uniqueness across clusters
When running multiple Milvus clusters, configure a unique cluster ID for each to ensure AutoIDs never overlap.
Configuration: Edit the common.clusterID
config in milvus.yaml
before initializing your cluster:
common:
clusterID: 3 # Must be unique across all clusters (Range: 0-7)
In this config, clusterID
specifies the unique identifier used in AutoID generation, ranging from 0 to 7 (supports up to eight clusters).
Milvus handles bit-reversal internally to enable future expansion without ID overlap. No manual configuration needed beyond setting the cluster ID.
Reference: How AutoID works
Understanding how AutoID generates unique identifiers internally can help you configure cluster IDs correctly and troubleshoot ID-related issues.
AutoID uses a structured 64-bit format to guarantee uniqueness:
[sign_bit][cluster_id][physical_ts][logical_ts]
Segment |
Description |
---|---|
|
Reserved for internal use |
|
Identifies which cluster generated the ID (value range: 0-7) |
|
Timestamp in milliseconds when the ID was generated |
|
Counter to distinguish IDs created in the same millisecond |
Even when AutoID is enabled with VARCHAR
as the data type, Milvus still generates numeric IDs. These are stored as numeric strings with a maximum length of 20 characters (uint64 range).