About Milvus
Get Started
Concepts
User Guide
Data Import
Administration Guide
Tools
Integrations
Tutorials
FAQs
API Reference

Home
Docs
User Guide
Schema & Data Fields
JSON Field
JSON Field Overview

JSON Field Overview

When building applications like product catalogs, content management systems, or user preference engines, you often need to store flexible metadata alongside your vector embeddings. Product attributes vary by category, user preferences evolve over time, and document properties have complex nested structures. JSON fields in Milvus solve this challenge by allowing you to store and query flexible structured data without sacrificing performance.

What is a JSON field?

A JSON field is a schema-defined data type (DataType.JSON) in Milvus that stores structured key-value data. Unlike traditional rigid database columns, JSON fields accommodate nested objects, arrays, and mixed data types while providing multiple indexing options for fast queries.

Example JSON field structure:

{
  "metadata": { 
    "category": "electronics",
    "brand": "BrandA",
    "in_stock": true,
    "price": 99.99,
    "string_price": "99.99",
    "tags": ["clearance", "summer_sale"],
    "supplier": {
      "name": "SupplierX",
      "country": "USA",
      "contact": {
        "email": "support@supplierx.com",
        "phone": "+1-800-555-0199"
      }
    }
  }
}

In this example, metadata is a single JSON field that contains a mix of flat values (e.g. category, in_stock), arrays (tags), and nested objects (supplier).

Naming convention: Use only letters, numbers, and underscores in JSON keys. Avoid special characters, spaces, or dots as they may cause parsing issues in queries.

JSON field vs. dynamic field

A common point of confusion is the difference between a JSON field and the dynamic field. While both are related to JSON, they serve different purposes.

The table below summarizes the key differences between a JSON field and the dynamic field:

Feature	JSON Field	Dynamic Field
Schema definition	A scalar field that must be explicitly declared in the collection schema with the `DataType.JSON` type.	A hidden JSON field (named `#meta`) that automatically stores undeclared fields.
Use case	Stores structured data where the schema is known and consistent.	Stores flexible, evolving, or semi-structured data that doesn't fit a fixed schema.
Control	You control the field name and structure.	System-managed for undefined fields.
Querying	Query using your field name or target key inside the JSON field: `metadata["key"]`.	Query directly using the dynamic field key: `"dynamic_key"` or via `#meta`: `#meta["dynamic_key"]`

Basic operations

The fundamental workflow for using a JSON field involves defining it in your schema, inserting data, and then querying the data using specific filter expressions.

Define a JSON field

To use a JSON field, explicitly define it in your collection schema when creating the collection. The following example demonstrates how to create a collection with a metadata field of type DataType.JSON:

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="http://localhost:19530") # Replace with your server address 

# Create schema
schema = client.create_schema(auto_id=False, enable_dynamic_field=True)

schema.add_field(field_name="product_id", datatype=DataType.INT64, is_primary=True) # Primary field
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=5) # Vector field
# Define a JSON field that allows null values
schema.add_field(field_name="metadata", datatype=DataType.JSON, nullable=True)

client.create_collection(
    collection_name="product_catalog",
    schema=schema
)

In this example, the JSON field defined in the collection schema allows null values with nullable=True. For details, refer to Nullable & Default.

Insert data

Once the collection is created, insert entities that contain structured JSON objects in your designated JSON field. Your data should be formatted as a list of dictionaries.

entities = [
    {
        "product_id": 1,
        "vector": [0.1, 0.2, 0.3, 0.4, 0.5],
        "metadata": { # JSON field
            "category": "electronics",
            "brand": "BrandA",
            "in_stock": True,
            "price": 99.99,
            "string_price": "99.99",
            "tags": ["clearance", "summer_sale"],
            "supplier": {
                "name": "SupplierX",
                "country": "USA",
                "contact": {
                    "email": "support@supplierx.com",
                    "phone": "+1-800-555-0199"
                }
            }
        }
    }
]

client.insert(collection_name="product_catalog", data=entities)

Filtering operations

Before you can perform filtering operations on JSON fields, make sure:

You have created an index on each vector field.
The collection is loaded into memory.

Show code

index_params = client.prepare_index_params()
index_params.add_index(
    field_name="vector",
    index_type="AUTOINDEX",
    index_name="vector_index",
    metric_type="COSINE"
)

client.create_index(collection_name="product_catalog", index_params=index_params)

client.load_collection(collection_name="product_catalog")

Once these requirements are met, you can use the expressions below to filter on your collection based on the values within the JSON field. These filter expressions leverage specific JSON path syntax and dedicated operators.

Filtering with JSON path syntax

To query a specific key, use bracket notation to access JSON keys: json_field_name["key"]. For nested keys, chain them together: json_field_name["key1"]["key2"].

To filter for entities where the category is "electronics":

# Define filter expression
filter = 'metadata["category"] == "electronics"'

client.search(
    collection_name="product_catalog",  # Collection name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],               # Query vector (must match collection's vector dim)
    limit=5,                           # Max. number of results to return
    filter=filter,                    # Filter expression
    output_fields=["product_id", "metadata"]   # Fields to include in the search results
)

To filter for entities where the nested key supplier["country"] is "USA":

# Define filter expression
filter = 'metadata["supplier"]["country"] == "USA"'

res = client.search(
    collection_name="product_catalog",  # Collection name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],               # Query vector (must match collection's vector dim)
    limit=5,                           # Max. number of results to return
    filter=filter,                    # Filter expression
    output_fields=["product_id", "metadata"]   # Fields to include in the search results
)

print(res)

Filtering with JSON-specific operators

Milvus also provides special operators for querying array values on specific JSON field keys. For example:

json_contains(identifier, expr): Checks if a specific element or sub-array exists within a JSON array
json_contains_all(identifier, expr): Ensures that all elements of the specified JSON expression are present in the field
json_contains_any(identifier, expr): Filters entities where at least one member of the JSON expression exists within the field

To find a product that has the "summer_sale" value under the tags key:

# Define filter expression
filter = 'json_contains(metadata["tags"], "summer_sale")'

res = client.search(
    collection_name="product_catalog",  # Collection name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],               # Query vector (must match collection's vector dim)
    limit=5,                           # Max. number of results to return
    filter=filter,                    # Filter expression
    output_fields=["product_id", "metadata"]   # Fields to include in the search results
)

print(res)

To find a product that has at least one of the "electronics", "new", or "clearance" values under the tags key:

# Define filter expression
filter = 'json_contains_any(metadata["tags"], ["electronics", "new", "clearance"])'

res = client.search(
    collection_name="product_catalog",  # Collection name
    data=[[0.1, 0.2, 0.3, 0.4, 0.5]],               # Query vector (must match collection's vector dim)
    limit=5,                           # Max. number of results to return
    filter=filter,                    # Filter expression
    output_fields=["product_id", "metadata"]   # Fields to include in the search results
)

print(res)

For more information about JSON-specific operators, refer to JSON Operators.

Next: Accelerate JSON queries

By default, queries on JSON fields without acceleration will perform a full scan of all rows, which can be slow on large datasets. To speed up JSON queries, Milvus provides advanced indexing and storage optimization features.

The table below summarizes their differences and best-use scenarios:

Technique	Best For	Arrays Acceleration	Notes
JSON Indexing	Small set of frequently accessed keys, arrays on a specific array key	Yes (on indexed array key)	Must preselect keys, maintenance needed if schema evolves
JSON Shredding	General speed-up across many keys, flexible for varied queries	No (does not accelerate values inside arrays)	Extra storage config, arrays still need per-key index
NGRAM Index	Wildcard searches, substring matching in text fields	N/A	Not for numeric/range filters

Tip: You can combine these approaches—for example, use JSON shredding for broad query acceleration, JSON indexing for high-frequency array keys, and NGRAM indexing for flexible text search.

For implementation details, refer to:

FAQ

Are there any limitations on the size of a JSON field?

Yes. Each JSON field is limited to 65,536 bytes.

Does a JSON field support setting a default value?

No, JSON fields do not support default values. However, you can set nullable=True when defining the field to allow empty entries.

Refer to Nullable & Default for details.

Are there any naming conventions for JSON field keys?

Yes, to ensure compatibility with queries and indexing:

Use only letters, numbers, and underscores in JSON keys.
Avoid using special characters, spaces, or dots (., /, etc.).
Incompatible keys may cause parsing issues in filter expressions.

How does Milvus handle string values in JSON fields?

Milvus stores string values exactly as they appear in the JSON input—without semantic transformation. Improperly quoted strings may result in errors during parsing.

Examples of valid strings:

"a\"b", "a'b", "a\\b"

Examples of invalid strings:

'a"b', 'a\'b'

JSON Field Overview
What is a JSON field?
JSON field vs. dynamic field
Basic operations
Define a JSON field
Insert data
Filtering operations
Next: Accelerate JSON queries
FAQ
Are there any limitations on the size of a JSON field?
Does a JSON field support setting a default value?
Are there any naming conventions for JSON field keys?
How does Milvus handle string values in JSON fields?

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?