Milvus
Zilliz
Home
  • User Guide
  • Home
  • Docs
  • User Guide

  • Storage Optimization

  • Tiered Storage

  • Eviction

EvictionCompatible with Milvus 2.6.4+

Eviction manages the cache resources of each QueryNode in Milvus. When enabled, it automatically removes cached data once resource thresholds are reached, ensuring stable performance and preventing memory or disk exhaustion.

Eviction uses a Least Recently Used (LRU) policy to reclaim cache space. Metadata is always cached and never evicted, as it is essential for query planning and typically small.

Eviction must be explicitly enabled. Without configuration, cached data will continue to accumulate until resources are depleted.

Eviction types

Milvus supports two complementary eviction modes (sync and async) that work together for optimal resource management:

Aspect

Sync Eviction

Async Eviction

Trigger

Occurs during query or search when memory or disk usage exceeds internal limits.

Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL).

Behavior

Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail.

Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked.

Best For

Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough.

Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management.

Cautions

Can cause short query delays or timeouts if insufficient evictable data is available.

Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread.

Configuration

Enabled via evictionEnabled: true

Enabled via backgroundEvictionEnabled: true (requires evictionEnabled: true at the same time)

Recommended setup:

  • Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.

  • For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.

For evictable fields and indexes, the eviction unit matches the loading granularity—scalar/vector fields are evicted by chunk, and scalar/vector indexes are evicted by segment.

Enable eviction

Configure eviction under queryNode.segcore.tieredStorage in milvus.yaml:

queryNode:
  segcore:
    tieredStorage:
      evictionEnabled: true             # Enables synchronous eviction
      backgroundEvictionEnabled: true   # Enables background (asynchronous) eviction

Parameter

Type

Values

Description

Recommended use case

evictionEnabled

bool

true/false

Master switch for eviction strategy. Defaults to false. Enables sync eviction mode.

Always set to true in Tiered Storage.

backgroundEvictionEnabled

bool

true/false

Run eviction asynchronously in the background. Requires evictionEnabled: true. Defaults to false.

Use true for smoother query performance; it reduces sync eviction frequency.

Configure watermarks

Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:

  • High watermark: Eviction starts when usage exceeds this value.

  • Low watermark: Eviction continues until usage falls below this value.

This configuration takes effect only when eviction is enabled.

Example YAML:

queryNode:
  segcore:
    tieredStorage:
      # Memory watermarks
      memoryLowWatermarkRatio: 0.75    # Eviction stops below 75% memory usage
      memoryHighWatermarkRatio: 0.8    # Eviction starts above 80% memory usage

      # Disk watermarks
      diskLowWatermarkRatio: 0.75      # Eviction stops below 75% disk usage
      diskHighWatermarkRatio: 0.8      # Eviction starts above 80% disk usage

Parameter

Type

Range

Description

Recommended use case

memoryLowWatermarkRatio

float

(0.0, 1.0]

Memory usage level where eviction stops.

Start at 0.75. Lower slightly if QueryNode memory is limited.

memoryHighWatermarkRatio

float

(0.0, 1.0]

Memory usage level where async eviction starts.

Start at 0.8. Keep a sensible gap from low watermark (e.g., 0.05–0.10) to prevent frequent triggers.

diskLowWatermarkRatio

float

(0.0, 1.0]

Disk usage level where eviction stops.

Start at 0.75. Adjust lower if disk I/O is limited.

diskHighWatermarkRatio

float

(0.0, 1.0]

Disk usage level where async eviction starts.

Start at 0.8. Keep a sensible gap from low watermark (e.g., 0.05–0.10) to prevent frequent triggers.

Best practices:

  • Do not set high or low watermarks above ~0.80 to leave headroom for QueryNode static usage and query-time bursts.

  • Avoid large gaps between high and low watermarks; big gaps prolong each eviction cycle and can add latency.

Configure cache TTL

Cache Time-to-Live (TTL) automatically removes cached data after a set duration, even if resource thresholds are not reached. It works alongside LRU eviction to prevent stale data from occupying cache indefinitely.

Cache TTL requires backgroundEvictionEnabled: true, as it runs on the same background thread.

Example YAML:

queryNode:
  segcore:
    tieredStorage:
      evictionEnabled: true
      backgroundEvictionEnabled: true
      # Set the cache expiration time to 604,800 seconds (7 days),
      # and expired caches will be cleaned up by a background thread.
      cacheTtl: 604800

Parameter

Type

Unit

Description

Recommended use case

cacheTtl

integer

seconds

Duration before cached data expires. Expired items are removed in the background.

Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration.

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?