Milvus
Zilliz
Home
  • User Guide
  • Home
  • Docs
  • User Guide

  • Schema & Data Fields

  • JSON Field

  • JSON Indexing

JSON Indexing

JSON fields provide a flexible way to store structured metadata in Milvus. Without indexing, queries on JSON fields require full collection scans, which become slow as your dataset grows. JSON indexing enables fast lookups by creating indexes on within your JSON data.

JSON indexing is ideal for:

  • Structured schemas with consistent, known keys

  • Equality and range queries on specific JSON paths

  • Scenarios where you need precise control over which keys are indexed

  • Storage-efficient acceleration of targeted queries

For complex JSON documents with diverse query patterns, consider JSON Shredding as an alternative.

JSON indexing syntax

When you create a JSON index, you specify:

  • JSON path: The exact location of the data you want to index

  • Data cast type: How to interpret and store the indexed values

  • Optional type conversion: Transform data during indexing if needed

Here’s the syntax to index a JSON field:

# Prepare index params
index_params = MilvusClient.prepare_index_params()

index_params.add_index(
    field_name="<json_field_name>",  # Name of the JSON field
    index_type="AUTOINDEX",  # Must be AUTOINDEX or INVERTED
    index_name="<unique_index_name>",  # Index name
    params={
        "json_path": "<path_to_json_key>",  # Specific key to be indexed within JSON data
        "json_cast_type": "<data_type>",  # Data type to use when interpreting and indexing the value
        # "json_cast_function": "<cast_function>"  # Optional: convert key values into a target type at index time
    }
)

Parameter

Description

Value / Example

field_name

The name of your JSON field in the collection schema.

"metadata"

index_type

Must be "AUTOINDEX" or "INVERTED" for JSON indexing.

"AUTOINDEX"

index_name

Unique identifier for this index.

"category_index"

json_path

The path to the key you want to index within your JSON object.

  • Top-level key: 'metadata["category"]'

  • Nested key: 'metadata["supplier"]["contact"]["email"]'

  • Entire JSON object: "metadata"

  • Sub-object: 'metadata["supplier"]'

json_cast_type

The data type to use when interpreting and indexing the value. Must match the actual data type of the key.

For a list of available cast types, see Supported cast types below.

"VARCHAR"

json_cast_function

(Optional) Converts original key values to a target type at index time. This config is required only when key values are stored in a wrong format and you want to convert the data type during indexing.

For a list of available cast functions, see Supported cast functions below.

"STRING_TO_DOUBLE"

Supported cast types

Milvus supports the following data types for casting at index time. These types ensure that your data is interpreted correctly for efficient filtering.

Cast Type

Description

Example JSON Value

BOOL / bool

Used to index boolean values, enabling queries that filter on true/false conditions.

true, false

DOUBLE / double

Used for numeric values, including both integers and floating-point numbers. It enables filtering based on ranges or equality (e.g., >, <, ==).

42, 99.99

VARCHAR / varchar

Used to index string values, which is common for text-based data like names, categories, or IDs.

"electronics", "BrandA"

ARRAY_BOOL / array_bool

Used to index an array of boolean values.

[true, false, true]

ARRAY_DOUBLE / array_double

Used to index an array of numeric values.

[1.2, 3.14, 42]

ARRAY_VARCHAR / array_varchar

Used to index an array of strings, which is ideal for a list of tags or keywords.

["tag1", "tag2", "tag3"]

JSON / json

Entire JSON objects or sub-objects with automatic type inference and flattening.

Indexing entire JSON objects increases index size. For many-key scenarios, consider JSON Shredding.

Any JSON object

Arrays should contain elements of the same type for optimal indexing. For more information, refer to Array Field.

Supported cast functions

If your JSON field key contains values in an incorrect format (e.g., numbers stored as strings), you can pass a cast function to the json_cast_function argument to convert these values at index time.

Cast functions are case-insensitive. The following functions are supported:

Cast Function

Converts From → To

Use Case

STRING_TO_DOUBLE / string_to_double

String → Numeric (double)

Convert "99.99" to 99.99

If conversion fails (e.g., non-numeric string), the value is skipped and not indexed.

Create JSON indexes

This section demonstrates how to create indexes on different types of JSON data using practical examples. All examples use the sample JSON structure shown below and assume you’ve already established a connection to MilvusClient with a properly defined collection schema.

Sample JSON structure

{
  "metadata": { 
    "category": "electronics",
    "brand": "BrandA",
    "in_stock": true,
    "price": 99.99,
    "string_price": "99.99",
    "tags": ["clearance", "summer_sale"],
    "supplier": {
      "name": "SupplierX",
      "country": "USA",
      "contact": {
        "email": "support@supplierx.com",
        "phone": "+1-800-555-0199"
      }
    }
  }
}

Basic setup

Before creating any JSON indexes, prepare your index parameters:

# Prepare index params
index_params = MilvusClient.prepare_index_params()

Example 1: Index a simple JSON key

Create an index on the category field to enable fast filtering by product category:

index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX or INVERTED for JSON path indexing
    index_name="category_index",  # Unique index name
    params={
        "json_path": 'metadata["category"]', # Path to the JSON key
        "json_cast_type": "varchar" # Data cast type
    }
)

Example 2: Index a nested key

Create an index on the deeply nested email field for supplier contact searches:

# Index the nested key
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX or INVERTED for JSON path indexing
    index_name="email_index", # Unique index name
    params={
        "json_path": 'metadata["supplier"]["contact"]["email"]', # Path to the nested JSON key
        "json_cast_type": "varchar" # Data cast type
    }
)

Example 3: Convert data type at index time

Sometimes numeric data is mistakenly stored as strings. Use the STRING_TO_DOUBLE cast function to convert and index it properly:

# Convert string numbers to double for indexing
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX or INVERTED for JSON path indexing
    index_name="string_to_double_index", # Unique index name
    params={
        "json_path": 'metadata["string_price"]', # Path to the JSON key to be indexed
        "json_cast_type": "double", # Data cast type
        "json_cast_function": "STRING_TO_DOUBLE" # Cast function; case insensitive
    }
)

Important: If conversion fails for any document (e.g., a non-numeric string like "invalid"), that document’s value will be excluded from the index and won’t appear in filtered results.

Example 4: Index entire objects

Index the complete JSON object to enable queries on any field within it. When you use json_cast_type="JSON", the system automatically:

  • Flattens the JSON structure: Nested objects are converted into flat paths for efficient indexing

  • Infers data types: Each value is automatically categorized as numeric, string, boolean, or date based on its content

  • Creates comprehensive coverage: All keys and nested paths within the object become searchable

For the sample JSON structure above, index the entire metadata object:

# Index the entire JSON object
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX",
    index_name="metadata_full_index",
    params={
        "json_path": "metadata",
        "json_cast_type": "JSON"
    }
)

You can also index only a portion of the JSON structure, such as all supplier information:

# Index a sub-object
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", 
    index_name="supplier_index",
    params={
        "json_path": 'metadata["supplier"]',
        "json_cast_type": "JSON"
    }
)

Apply index configuration

After defining all your index parameters, apply them to your collection:

# Apply all index configurations to the collection
MilvusClient.create_index(
    collection_name="your_collection_name",
    index_params=index_params
)

Once indexing completes, your JSON field queries will automatically use these indexes for faster performance.

FAQ

What happens if a query’s filter expression uses a different type than the indexed cast type?

If your filter expression uses a different type than the index’s json_cast_type, Milvus will not use the index and may fall back to a slower brute-force scan if the data allows. For best performance, always align your filter expression with the cast type of the index. For example, if a numeric index is created with json_cast_type="double", only numeric filter conditions will leverage the index.

When creating a JSON index, what if a JSON key has inconsistent data types across different entities?

Inconsistent types can lead to partial indexing. For example, if a metadata["price"] field is stored as both a number (99.99) and a string ("99.99") and you create an index with json_cast_type="double", only the numeric values will be indexed. The string-form entries will be skipped and will not appear in filter results.

Can I create multiple indexes on the same JSON key?

No, each JSON key supports only one index. You must choose a single json_cast_type that matches your data. However, you can create an index on the entire JSON object and an index on a nested key within that object.

Does a JSON field support setting a default value?

No, JSON fields do not support default values. However, you can set nullable=True when defining the field to allow for empty entries. For more information, refer to Nullable & Default.