About Milvus
Get Started
Concepts
User Guide
Data Import
Administration Guide
Tools
Integrations
Tutorials
FAQs
API Reference

Home
Docs
User Guide
Schema & Data Fields
JSON Field
JSON Shredding

JSON ShreddingCompatible with Milvus 2.6.2+

JSON shredding accelerates JSON queries by converting traditional row-based storage into optimized columnar storage. While maintaining JSON’s flexibility for data modeling, Milvus performs behind-the-scenes columnar optimization that dramatically improves access and query efficiency.

JSON shredding is effective for most JSON query scenarios. The performance benefits become more pronounced with:

Larger, more complex JSON documents - Greater performance gains as document size increases
Read-heavy workloads - Frequent filtering, sorting, or searching on JSON keys
Mixed query patterns - Queries across different JSON keys benefit from the hybrid storage approach

How it works

The JSON shredding process happens in three distinct phases to optimize data for fast retrieval.

Phase 1: Ingestion & key classification

As new JSON documents are written, Milvus continuously samples and analyzes them to build statistics for each JSON key. This analysis includes the key’s occurrence ratio and type stability (whether its data type is consistent across documents).

Based on these statistics, JSON keys are categorized into the following for optimal storage.

Categories of JSON keys

Key Type	Description
Typed keys	Keys that exist in most documents and always have the same data type (e.g., all integers or all strings).
Dynamic keys	Keys that appear frequently but have a mixed data type (e.g., sometimes a string, sometimes an integer).
Shared keys	Infrequently appearing or nested keys that fall below a configurable frequency threshold.

Example classification

Consider the sample JSON data containing the following JSON keys:

{"a": 10, "b": "str1", "f": 1}
{"a": 20, "b": "str2", "f": 2}  
{"a": 30, "b": "str3", "f": 3}
{"a": 40, "b": 1, "f": 4}       // b becomes mixed type
{"a": 50, "b": 2, "e": "rare"}  // e appears infrequently

Based on this data, the keys would be classified as follows:

Typed keys: a and f (always an integer)
Dynamic keys: b (mixed string/integer)
Shared keys: e (infrequently appearing key)

Phase 2: Storage optimization

The classification from Phase 1 dictates the storage layout. Milvus uses a columnar format optimized for queries.

Json Shredding Flow

Shredded columns: For typed and dynamic keys, data is written to dedicated columns. This columnar storage allows for fast, direct scans during queries, as Milvus can read only the required data for a given key without processing the entire document.
Shared column: All shared keys are stored together in a single, compact binary JSON column. A shared-key inverted index is built on this column. This index is crucial for accelerating queries on low-frequency keys by allowing Milvus to quickly prune the data, effectively narrowing down the search space to only those rows that contain the specified key.

Phase 3: Query execution

The final phase leverages the optimized storage layout to intelligently select the fastest path for each query predicate.

Fast path: Queries on typed/dynamic keys (e.g., json['a'] < 100) access dedicated columns directly
Optimized path: Queries on shared keys (e.g., json['e'] = 'rare') use inverted index to quickly locate relevant documents

Enable JSON shredding

To activate the feature, set common.enabledJSONKeyStats to true in your milvus.yaml configuration file. New data will automatically trigger the shredding process.

# milvus.yaml
...
common:
  enabledJSONKeyStats: true # Indicates whether to enable JSON key stats build and load processes
...

Once enabled, Milvus will begin analyzing and restructuring your JSON data upon ingestion without any further manual intervention.

Parameter tuning

For most users, once JSON shredding is enabled, the default settings for other parameters are sufficient. However, you can fine-tune the behavior of JSON shredding using these parameters in milvus.yaml.

Parameter Name	Description	Default Value	Tuning Advice
`common.enabledJSONKeyStats`	Controls whether the JSON shredding build and load processes are enabled.	false	Must be set to true to activate the feature.
`common.usingJsonStatsForQuery`	Controls whether Milvus uses shredded data for acceleration.	true	Set to false as a recovery measure if queries fail, reverting to the original query path.
`queryNode.mmap.jsonStats`	Determines whether Milvus uses mmap when loading shredding data. For details, refer to Use mmap.	true	This setting is generally optimized for performance. Only adjust it if you have specific memory management needs or constraints on your system.
`dataCoord.jsonStatsMaxShreddingColumns`	The maximum number of JSON keys that will be stored in shredded columns. If the number of frequently appearing keys exceeds this limit, Milvus will prioritize the most frequent ones for shredding, and the remaining keys will be stored in the shared column.	1024	This is sufficient for most scenarios. For JSON with thousands of frequently appearing keys, you may need to increase this, but monitor storage usage.
`dataCoord.jsonStatsShreddingRatioThreshold`	The minimum occurrence ratio a JSON key must have to be considered for shredding into a shredded column. A key is considered frequently appearing if its ratio is above this threshold.	0.3	Increase (e.g., to 0.5) if the number of keys that meet the shredding criteria exceeds the `dataCoord.jsonStatsMaxShreddingColumns` limit. This makes the threshold stricter, reducing the number of keys that qualify for shredding. Decrease (e.g., to 0.1) if you want to shred more keys that appear less frequently than the default 30% threshold.

Performance benchmarks

Our testing demonstrates significant performance improvements across different JSON key types and query patterns.

Test environment and methodology

Hardware: 1 core/8GB cluster
Dataset: 1 million documents from JSONBench
Average document size: 478.89 bytes
Test duration: 100 seconds measuring QPS and latency

Results: typed keys

This test measured performance when querying a key present in most documents.

Query Expression	Key Value Type	QPS (without shredding)	QPS (with shredding)	Performance Boost
`json['time_us'] > 0`	Integer	8.69	287.50	33x
`json['kind'] == 'commit'`	String	8.42	126.1	14.9x

Results: shared keys

This test focused on querying sparse, nested keys that fall into the “shared” category.

Query Expression	Key Value Type	QPS (without shredding)	QPS (with shredding)	Performance Boost
`json['identity']['seq'] > 0`	Nested Integer	4.33	385	88.9x
`json['identity']['did'] == 'xxxxx'`	Nested String	7.6	352	46.3x

Key insights

Shared key queries show the most dramatic improvements (up to 89x faster)
Typed key queries provide consistent 15-30x performance gains
All query types benefit from JSON Shredding with no performance regressions

FAQ

How do I verify if JSON shredding works properly?
1. First, check if the data has been built by using the show segment --format table command in the Birdwatcher tool. If successful, the output will contain shredding_data/ and shared_key_index/ under the Json Key Stats field.
  
  Birdwatcher Output
2. Next, verify that the data has been loaded by running show loaded-json-stats on the query node. The output will display details about the loaded shredded data for each query node.
How do I select between JSON shredding and JSON indexing?
- JSON shredding is ideal for keys that appear frequently in your documents, especially for complex JSON structures. It combines the benefits of columnar storage and inverted indexing, making it well-suited for read-heavy scenarios where you query many different keys. However, it is not recommended for very small JSON documents as the performance gain is minimal. The smaller the proportion of the key’s value to the total size of the JSON document, the better the performance optimization from shredding.
- JSON indexing is better for targeted optimization of specific key-based queries and has lower storage overhead. It’s suitable for simpler JSON structures. Note that JSON shredding does not cover queries on keys inside arrays, so you need a JSON index to accelerate those.
What if I encounter an error?

If the build or load process fails, you can quickly disable the feature by setting common.enabledJSONKeyStats=false. To clear any remaining tasks, use the remove stats-task <task_id> command in Birdwatcher. If a query fails, set common.usingJsonStatsForQuery=false to revert to the original query path, bypassing the shredded data.

JSON Shredding
How it works
Phase 1: Ingestion & key classification
Phase 2: Storage optimization
Phase 3: Query execution
Enable JSON shredding
Parameter tuning
Performance benchmarks
Test environment and methodology
Results: typed keys
Results: shared keys
Key insights
FAQ

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?