EvictionCompatible with Milvus 2.6.4+

Eviction manages the cache resources of each QueryNode in Milvus. When enabled, it automatically removes cached data once resource thresholds are reached, ensuring stable performance and preventing memory or disk exhaustion.

Eviction uses a Least Recently Used (LRU) policy to reclaim cache space. Metadata is always cached and never evicted, as it is essential for query planning and typically small.

Eviction must be explicitly enabled. Without configuration, cached data will continue to accumulate until resources are depleted.

Eviction types

Milvus supports two complementary eviction modes (sync and async) that work together for optimal resource management:

Aspect	Sync Eviction	Async Eviction
Trigger	Occurs during query or search when memory or disk usage exceeds internal limits.	Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL).
Behavior	Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail.	Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked.
Best For	Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough.	Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management.
Cautions	Can cause short query delays or timeouts if insufficient evictable data is available.	Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread.
Configuration	Enabled via `evictionEnabled: true`	Enabled via `backgroundEvictionEnabled: true` (requires `evictionEnabled: true` at the same time)

Recommended setup:

Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.
For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.

For evictable fields and indexes, the eviction unit matches the loading granularity—scalar/vector fields are evicted by chunk, and scalar/vector indexes are evicted by segment.

Enable eviction

Configure eviction under queryNode.segcore.tieredStorage in milvus.yaml:

queryNode:
  segcore:
    tieredStorage:
      evictionEnabled: true             # Enables synchronous eviction
      backgroundEvictionEnabled: true   # Enables background (asynchronous) eviction

Parameter	Type	Values	Description	Recommended use case
`evictionEnabled`	bool	`true`/`false`	Master switch for eviction strategy. Defaults to `false`. Enables sync eviction mode.	Always set to `true` in Tiered Storage.
`backgroundEvictionEnabled`	bool	`true`/`false`	Run eviction asynchronously in the background. Requires `evictionEnabled: true`. Defaults to `false`.	Use `true` for smoother query performance; it reduces sync eviction frequency.

Configure watermarks

Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:

High watermark: Eviction starts when usage exceeds this value.
Low watermark: Eviction continues until usage falls below this value.

This configuration takes effect only when eviction is enabled.

Example YAML:

queryNode:
  segcore:
    tieredStorage:
      # Memory watermarks
      memoryLowWatermarkRatio: 0.75    # Eviction stops below 75% memory usage
      memoryHighWatermarkRatio: 0.8    # Eviction starts above 80% memory usage

      # Disk watermarks
      diskLowWatermarkRatio: 0.75      # Eviction stops below 75% disk usage
      diskHighWatermarkRatio: 0.8      # Eviction starts above 80% disk usage

Parameter	Type	Range	Description	Recommended use case
`memoryLowWatermarkRatio`	float	(0.0, 1.0]	Memory usage level where eviction stops.	Start at `0.75`. Lower slightly if QueryNode memory is limited.
`memoryHighWatermarkRatio`	float	(0.0, 1.0]	Memory usage level where async eviction starts.	Start at `0.8`. Keep a sensible gap from low watermark (e.g., 0.05–0.10) to prevent frequent triggers.
`diskLowWatermarkRatio`	float	(0.0, 1.0]	Disk usage level where eviction stops.	Start at `0.75`. Adjust lower if disk I/O is limited.
`diskHighWatermarkRatio`	float	(0.0, 1.0]	Disk usage level where async eviction starts.	Start at `0.8`. Keep a sensible gap from low watermark (e.g., 0.05–0.10) to prevent frequent triggers.

Best practices:

Do not set high or low watermarks above ~0.80 to leave headroom for QueryNode static usage and query-time bursts.
Avoid large gaps between high and low watermarks; big gaps prolong each eviction cycle and can add latency.

Configure cache TTL

Cache Time-to-Live (TTL) automatically removes cached data after a set duration, even if resource thresholds are not reached. It works alongside LRU eviction to prevent stale data from occupying cache indefinitely.

Cache TTL requires backgroundEvictionEnabled: true, as it runs on the same background thread.

Example YAML:

queryNode:
  segcore:
    tieredStorage:
      evictionEnabled: true
      backgroundEvictionEnabled: true
      # Set the cache expiration time to 604,800 seconds (7 days),
      # and expired caches will be cleaned up by a background thread.
      cacheTtl: 604800

Parameter	Type	Unit	Description	Recommended use case
`cacheTtl`	integer	seconds	Duration before cached data expires. Expired items are removed in the background.	Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration.

Eviction
Eviction types
Enable eviction
Configure watermarks
Configure cache TTL

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?