About Milvus
Get Started
Concepts
User Guide
Data Import
AI Tools
Administration Guide
Tools
Integrations
Tutorials
FAQs
API Reference

Best Practices for Tiered StorageCompatible with Milvus 2.6.4+

Milvus provides Tiered Storage to help you efficiently handle large-scale data while balancing query latency, capacity, and resource usage. This guide summarizes recommended configurations for typical workloads and explains the reasoning behind each tuning strategy.

Before you start

Milvus v2.6.4 or later
QueryNodes must have dedicated local resources (memory and disk). Shared environments may distort cache estimation and lead to eviction misjudgment.

Choose the right strategy

Tiered Storage offers flexible loading and caching strategies that can be combined to fit your workload.

Goal	Recommended focus	Key mechanism
Minimize first-query latency	Preload critical fields	Warm Up
Handle large-scale data efficiently	Load on demand	Lazy Load + Partial Load
Maintain long-term stability	Prevent cache overflow	Eviction
Balance performance and capacity	Combine preload and dynamic caching	Hybrid configuration

Scenario 1: real-time, low latency retrieval

When to use

Query latency is critical (e.g., real-time recommendation or search ranking)
Core vector indexes and scalar filters are accessed frequently
Consistent performance matters more than startup speed

Recommended configuration

# milvus.yaml
queryNode:
  segcore:
    tieredStorage:
      warmup:
        # scalar field/index warm-up to eliminate first-time latency
        scalarField: sync
        scalarIndex: sync
        # warm-up of vector fields is disabled (if the original vector is not required)
        vectorField: disable
        # vector indexes warm-up to elminate first-time latenct
        vectorIndex: sync
      # enable cache eviction, and also turn on background asynchronous eviction
      # to reduce the triggering of synchronous eviction.
      evictionEnabled: true
      backgroundEvictionEnabled: true
      memoryLowWatermarkRatio: 0.75
      memoryHighWatermarkRatio: 0.8
      diskLowWatermarkRatio: 0.75
      diskHighWatermarkRatio: 0.8
      # no expiration time, which avoids frequent reloading
      cacheTtl: 0

Rationale

Warmup eliminates first-hit latency for high-frequency scalar and vector indexes.
Background eviction maintains stable cache pressure without blocking queries.
Disabling cache TTL avoids unnecessary reloads for hot data.

Scenario 2: offline, batch analysis

When to use

Query latency tolerance is high
Workloads involve massive datasets or many segments
Capacity and throughput are prioritized over responsiveness

Recommended configuration

# milvus.yaml
queryNode:
  segcore:
    tieredStorage:
      enabled: true
      warmup:
        # disable scalar field/index warm-up to speed up loading
        scalarField: disable
        scalarIndex: disable
        # disable vector field/index warm-up to speed up loading
        vectorField: disable
        vectorIndex: disable
      # enable cache eviction, and also turn on background asynchronous eviction
      # to reduce the triggering of synchronous eviction.
      evictionEnabled: true
      backgroundEvictionEnabled: true
      memoryLowWatermarkRatio: 0.7
      memoryHighWatermarkRatio: 0.85
      diskLowWatermarkRatio: 0.7
      diskHighWatermarkRatio: 0.85
      # use 1 day expiration to clean unused cache
      cacheTtl: 86400

Rationale

Disabling warm-up accelerates startup when initializing many segments.
Higher watermarks allow denser cache usage, improving total load capacity.
Cache TTL automatically cleans unused data to free local space.

Scenario 3: hybrid deployment (mixed online + offline)

When to use

A single cluster serves both online and analytical workloads
Some collections require low latency, others prioritize capacity

Recommended strategy

Apply real-time configuration to latency-sensitive collections
Apply offline configuration to analytical or archival collections
Adjust evictableMemoryCacheRatio, cacheTtl, and watermark ratios independently for each workload type

Rationale

Combining configurations allows fine-grained control of resource allocation.

Critical collections maintain low-latency guarantees, while secondary collections can handle more segments and data volume.

Additional tuning tips

Aspect	Recommendation	Explanation
Warm Up scope	Only preload fields or indexes with high query frequency.	Unnecessary preloading increases load time and resource use.
Eviction tuning	Start with default watermarks (75–80%) and adjust gradually.	A small gap causes frequent eviction; a large gap delays resource release.
Cache TTL	Disable for stable hot datasets; enable (e.g., 1–3 days) for dynamic data.	Prevents stale cache buildup while balancing cleanup overhead.
Overcommit ratio	Avoid values > 0.7 unless resource headroom is large.	Excessive overcommit may cause cache thrashing and unstable latency.
Monitoring	Track cache hit ratio, resource utilization, and eviction frequency.	Frequent cold loads may indicate that warm-up or watermarks need adjustment.

Best Practices for Tiered Storage
Before you start
Choose the right strategy
Scenario 1: real-time, low latency retrieval
Scenario 2: offline, batch analysis
Scenario 3: hybrid deployment (mixed online + offline)
Additional tuning tips

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?

Best Practices for Tiered StorageCompatible with Milvus 2.6.4+

Before you start

Choose the right strategy

Scenario 1: real-time, low latency retrieval

Scenario 2: offline, batch analysis

Scenario 3: hybrid deployment (mixed online + offline)

Additional tuning tips

Table of contents

Try Managed Milvus for Free

Feedback