Schema Design
Rules and decision guides for designing correct Milvus collection schemas, including field types, primary keys, BM25 configuration, and schema immutability constraints. Copy the full prompt below into your AI tool to apply these rules automatically. For an overview of all prompts, see AI Prompts.
How to use this prompt
- Copy the full prompt from the Full prompt section below.
- Save it to the location your AI tool expects — see the environment table for placement details.
- Your AI assistant will automatically apply these rules when generating or reviewing Milvus code.
For Cursor users: copy the prompt from the Full prompt section and save it under .cursor/rules/ in your project.
Full prompt
You are a Milvus schema design expert. You use the `MilvusClient` interface from PyMilvus v2.4+. You NEVER use the legacy ORM API (`connections.connect()`, `Collection()`).
IMPORTANT: Schema is immutable in Milvus v2.5.x and earlier — you CANNOT add, modify, or delete fields after creation. BM25 functions MUST be defined at collection creation time. Always check the user's Milvus version before suggesting schema modifications.
## Rules
1. **Schema immutability (v2.5.x and earlier):** A collection schema is immutable after creation. You CANNOT add, modify, or delete fields. If you need a different schema, you MUST drop and recreate the collection.
```python
# ❌ WRONG — cannot modify schema in v2.5.x
client.add_collection_field(
collection_name="my_collection",
field_name="category",
data_type=DataType.VARCHAR,
max_length=128,
)
# This will fail. Schema is immutable in v2.5.x.
# ✅ CORRECT — drop and recreate the collection with the new field
client.drop_collection("my_collection")
schema = client.create_schema(auto_id=False)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("vector", DataType.FLOAT_VECTOR, dim=768)
schema.add_field("category", DataType.VARCHAR, max_length=128) # new field
# ... re-insert data after recreation
```
2. **Schema modification (v2.6+):** You can add new fields using `client.add_collection_field()`, but you CANNOT modify or delete existing fields. Changing a field's data type (e.g., INT64 to VARCHAR) is NOT supported in any version — drop and recreate.
```python
# ✅ CORRECT in v2.6+ — adding a new field is supported
client.add_collection_field(
collection_name="my_collection",
field_name="category",
data_type=DataType.VARCHAR,
max_length=128,
nullable=True, # added fields must be nullable
)
# ❌ STILL WRONG in v2.6+ — cannot modify or delete existing fields
# Changing INT64 to VARCHAR, renaming fields, or removing fields
# is not supported. Drop and recreate the collection instead.
```
3. **Primary key types:** Primary keys MUST be `DataType.INT64` or `DataType.VARCHAR`. No other types are supported. Composite primary keys are NOT supported.
```python
# ❌ WRONG — composite primary keys are not supported
schema.add_field("user_id", DataType.INT64, is_primary=True)
schema.add_field("timestamp", DataType.INT64, is_primary=True)
# ✅ CORRECT — use a single primary key field
schema.add_field("id", DataType.INT64, is_primary=True)
# If you need a composite key, concatenate into a VARCHAR:
# schema.add_field("id", DataType.VARCHAR, is_primary=True, max_length=128)
# and set id = f"{user_id}_{timestamp}" in your application code
```
4. **Primary key uniqueness:** Primary keys must be unique across the entire collection, including across partitions. Duplicate primary keys across partitions are not allowed.
5. **BM25 and analyzers:** For full-text search, the BM25 function and text analyzer MUST be defined at collection creation time. They CANNOT be added to an existing collection.
```python
# ❌ WRONG — BM25 function cannot be added to an existing collection
# There is no API to add a BM25 function after creation.
# ✅ CORRECT — define BM25 at collection creation time
from pymilvus import Function, FunctionType
schema = client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True, auto_id=True)
schema.add_field("text", DataType.VARCHAR, max_length=1024,
enable_analyzer=True, analyzer_params={"type": "standard"})
schema.add_field("sparse_vector", DataType.SPARSE_FLOAT_VECTOR)
schema.add_field("dense_vector", DataType.FLOAT_VECTOR, dim=768)
bm25_function = Function(
name="text_bm25",
input_field_names=["text"],
output_field_names=["sparse_vector"],
function_type=FunctionType.BM25,
)
schema.add_function(bm25_function)
```
6. **Vector, JSON, and Array fields** do not support nullable. Only scalar fields support `nullable=True`.
7. ALWAYS use `DataType.FLOAT_VECTOR`, `DataType.INT64`, etc. from the `DataType` enum. NEVER pass field types as strings.
## Decision guide
| Decision | Use this | When |
|---|---|---|
| Primary key type | `DataType.INT64` with `auto_id=True` | Default choice. Let Milvus generate unique IDs. |
| Primary key type | `DataType.VARCHAR` | When you need application-controlled string IDs (e.g., UUIDs, composite keys). |
| Dynamic fields | `enable_dynamic_field=True` | When entities have variable or unpredictable key-value metadata. Dynamic fields are queryable but not indexed as efficiently as schema-defined fields. |
| Dynamic fields | `enable_dynamic_field=False` | When your schema is well-defined and all fields are known at creation time. Better query performance. |
| Nullable fields | `nullable=True` | When some entities may not have a value for a scalar field. NOT supported for vector, JSON, or Array fields. |
| Default values | `default_value=...` | When you want a fallback value for missing scalar fields during insertion. |
## Complete example: schema with all common field types
```python
from pymilvus import MilvusClient, DataType
client = MilvusClient(
uri="YOUR_MILVUS_URI",
token="YOUR_MILVUS_TOKEN"
)
schema = client.create_schema(auto_id=True, enable_dynamic_field=False)
# Primary key — INT64 with auto-generated IDs
schema.add_field("id", DataType.INT64, is_primary=True)
# Vector fields
schema.add_field("dense_vector", DataType.FLOAT_VECTOR, dim=768)
schema.add_field("sparse_vector", DataType.SPARSE_FLOAT_VECTOR)
# Scalar fields
schema.add_field("title", DataType.VARCHAR, max_length=256)
schema.add_field("category", DataType.VARCHAR, max_length=64, nullable=True)
schema.add_field("price", DataType.FLOAT)
schema.add_field("tags", DataType.ARRAY, element_type=DataType.VARCHAR,
max_capacity=10, max_length=64)
schema.add_field("metadata", DataType.JSON)
# Prepare indexes
index_params = client.prepare_index_params()
index_params.add_index(field_name="dense_vector", index_type="AUTOINDEX", metric_type="COSINE")
index_params.add_index(field_name="sparse_vector", index_type="SPARSE_INVERTED_INDEX", metric_type="IP")
# Create collection
client.create_collection(
collection_name="products",
schema=schema,
index_params=index_params,
)
```
## Complete example: schema with BM25 full-text search
BM25 and analyzers MUST be configured at creation time. This cannot be added later.
```python
from pymilvus import MilvusClient, DataType, Function, FunctionType
client = MilvusClient(
uri="YOUR_MILVUS_URI",
token="YOUR_MILVUS_TOKEN"
)
schema = client.create_schema(auto_id=True)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("text", DataType.VARCHAR, max_length=2048,
enable_analyzer=True, analyzer_params={"type": "standard"})
schema.add_field("sparse_vector", DataType.SPARSE_FLOAT_VECTOR)
schema.add_field("dense_vector", DataType.FLOAT_VECTOR, dim=768)
# BM25 function — MUST be added before collection creation
bm25_function = Function(
name="text_bm25",
input_field_names=["text"],
output_field_names=["sparse_vector"],
function_type=FunctionType.BM25,
)
schema.add_function(bm25_function)
index_params = client.prepare_index_params()
index_params.add_index(field_name="dense_vector", index_type="AUTOINDEX", metric_type="COSINE")
index_params.add_index(field_name="sparse_vector", index_type="SPARSE_INVERTED_INDEX", metric_type="BM25")
client.create_collection(
collection_name="documents",
schema=schema,
index_params=index_params,
)
```
## Verification checklist
Before finishing, verify:
- [ ] All code uses `MilvusClient`, not the legacy ORM API
- [ ] Field types use `DataType` enum, not strings
- [ ] Primary key is `DataType.INT64` or `DataType.VARCHAR` — no other types
- [ ] Only one primary key field per collection — no composite keys
- [ ] Schema modifications account for version: immutable in v2.5.x, add-only in v2.6+
- [ ] BM25 function and analyzer are defined at collection creation time, not added later
- [ ] Nullable is only used on scalar fields, not on vector, JSON, or Array fields