Milvus
Zilliz
Home

Best Practices for Tiered StorageCompatible with Milvus 2.6.4+

Milvus provides Tiered Storage to help you efficiently handle large-scale data while balancing query latency, capacity, and resource usage. This guide summarizes recommended configurations for typical workloads and explains the reasoning behind each tuning strategy.

Before you start

  • Milvus v2.6.4 or later

  • QueryNodes must have dedicated local resources (memory and disk). Shared environments may distort cache estimation and lead to eviction misjudgment.

Choose the right strategy

Tiered Storage offers flexible loading and caching strategies that can be combined to fit your workload.

Goal

Recommended focus

Key mechanism

Minimize first-query latency

Preload critical fields

Warm Up

Handle large-scale data efficiently

Load on demand

Lazy Load + Partial Load

Maintain long-term stability

Prevent cache overflow

Eviction

Balance performance and capacity

Combine preload and dynamic caching

Hybrid configuration

Scenario 1: real-time, low latency retrieval

When to use

  • Query latency is critical (e.g., real-time recommendation or search ranking)

  • Core vector indexes and scalar filters are accessed frequently

  • Consistent performance matters more than startup speed

Recommended configuration

# milvus.yaml
queryNode:
  segcore:
    tieredStorage:
      warmup:
        # scalar field/index warm-up to eliminate first-time latency
        scalarField: sync
        scalarIndex: sync
        # warm-up of vector fields is disabled (if the original vector is not required)
        vectorField: disable
        # vector indexes warm-up to elminate first-time latenct
        vectorIndex: sync
      # enable cache eviction, and also turn on background asynchronous eviction
      # to reduce the triggering of synchronous eviction.
      evictionEnabled: true
      backgroundEvictionEnabled: true
      memoryLowWatermarkRatio: 0.75
      memoryHighWatermarkRatio: 0.8
      diskLowWatermarkRatio: 0.75
      diskHighWatermarkRatio: 0.8
      # no expiration time, which avoids frequent reloading
      cacheTtl: 0

Rationale

  • Warmup eliminates first-hit latency for high-frequency scalar and vector indexes.

  • Background eviction maintains stable cache pressure without blocking queries.

  • Disabling cache TTL avoids unnecessary reloads for hot data.

Scenario 2: offline, batch analysis

When to use

  • Query latency tolerance is high

  • Workloads involve massive datasets or many segments

  • Capacity and throughput are prioritized over responsiveness

Recommended configuration

# milvus.yaml
queryNode:
  segcore:
    tieredStorage:
      enabled: true
      warmup:
        # disable scalar field/index warm-up to speed up loading
        scalarField: disable
        scalarIndex: disable
        # disable vector field/index warm-up to speed up loading
        vectorField: disable
        vectorIndex: disable
      # enable cache eviction, and also turn on background asynchronous eviction
      # to reduce the triggering of synchronous eviction.
      evictionEnabled: true
      backgroundEvictionEnabled: true
      memoryLowWatermarkRatio: 0.7
      memoryHighWatermarkRatio: 0.85
      diskLowWatermarkRatio: 0.7
      diskHighWatermarkRatio: 0.85
      # use 1 day expiration to clean unused cache
      cacheTtl: 86400

Rationale

  • Disabling warm-up accelerates startup when initializing many segments.

  • Higher watermarks allow denser cache usage, improving total load capacity.

  • Cache TTL automatically cleans unused data to free local space.

Scenario 3: hybrid deployment (mixed online + offline)

When to use

  • A single cluster serves both online and analytical workloads

  • Some collections require low latency, others prioritize capacity

Recommended strategy

  • Apply real-time configuration to latency-sensitive collections

  • Apply offline configuration to analytical or archival collections

  • Adjust evictableMemoryCacheRatio, cacheTtl, and watermark ratios independently for each workload type

Rationale

Combining configurations allows fine-grained control of resource allocation.

Critical collections maintain low-latency guarantees, while secondary collections can handle more segments and data volume.

Additional tuning tips

Aspect

Recommendation

Explanation

Warm Up scope

Only preload fields or indexes with high query frequency.

Unnecessary preloading increases load time and resource use.

Eviction tuning

Start with default watermarks (75–80%) and adjust gradually.

A small gap causes frequent eviction; a large gap delays resource release.

Cache TTL

Disable for stable hot datasets; enable (e.g., 1–3 days) for dynamic data.

Prevents stale cache buildup while balancing cleanup overhead.

Overcommit ratio

Avoid values > 0.7 unless resource headroom is large.

Excessive overcommit may cause cache thrashing and unstable latency.

Monitoring

Track cache hit ratio, resource utilization, and eviction frequency.

Frequent cold loads may indicate that warm-up or watermarks need adjustment.

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?