EvictionCompatible with Milvus 2.6.4+
Eviction manages the cache resources of each QueryNode in Milvus. When enabled, it automatically removes cached data once resource thresholds are reached, ensuring stable performance and preventing memory or disk exhaustion.
Eviction uses a Least Recently Used (LRU) policy to reclaim cache space. Metadata is always cached and never evicted, as it is essential for query planning and typically small.
Eviction must be explicitly enabled. Without configuration, cached data will continue to accumulate until resources are depleted.
Eviction types
Milvus supports two complementary eviction modes (sync and async) that work together for optimal resource management:
Aspect |
Sync Eviction |
Async Eviction |
|---|---|---|
Trigger |
Occurs during query or search when memory or disk usage exceeds internal limits. |
Triggered by a background thread when usage exceeds the high watermark or when cached data reaches its time-to-live (TTL). |
Behavior |
Query or search operations pause temporarily while the QueryNode reclaims cache space. Eviction continues until usage drops below the low watermark or a timeout occurs. If timeout is reached and insufficient data can be reclaimed, the query or search may fail. |
Runs periodically in the background, proactively evicting cached data when usage exceeds the high watermark or when data expires based on TTL. Eviction continues until usage drops below the low watermark. Queries are not blocked. |
Best For |
Workloads that can tolerate brief latency spikes or temporary pauses during peak usage. Useful when async eviction cannot reclaim space fast enough. |
Latency-sensitive workloads that require smooth and predictable query performance. Ideal for proactive resource management. |
Cautions |
Can cause short query delays or timeouts if insufficient evictable data is available. |
Requires properly tuned high/low watermarks and TTL settings. Slight overhead from the background thread. |
Configuration |
Enabled via |
Enabled via |
Recommended setup:
Both eviction modes can be enabled together for optimal balance, provided your workload benefits from Tiered Storage and can tolerate eviction-related fetch latency.
For performance testing or latency-critical scenarios, consider disabling eviction entirely to avoid network fetch overhead after eviction.
For evictable fields and indexes, the eviction unit matches the loading granularity—scalar/vector fields are evicted by chunk, and scalar/vector indexes are evicted by segment.
Enable eviction
Configure eviction under queryNode.segcore.tieredStorage in milvus.yaml:
queryNode:
segcore:
tieredStorage:
evictionEnabled: true # Enables synchronous eviction
backgroundEvictionEnabled: true # Enables background (asynchronous) eviction
Parameter |
Type |
Values |
Description |
Recommended use case |
|---|---|---|---|---|
|
bool |
|
Master switch for eviction strategy. Defaults to |
Always set to |
|
bool |
|
Run eviction asynchronously in the background. Requires |
Use |
Configure watermarks
Watermarks define when cache eviction begins and ends for both memory and disk. Each resource type has two thresholds:
High watermark: Eviction starts when usage exceeds this value.
Low watermark: Eviction continues until usage falls below this value.
This configuration takes effect only when eviction is enabled.
Example YAML:
queryNode:
segcore:
tieredStorage:
# Memory watermarks
memoryLowWatermarkRatio: 0.75 # Eviction stops below 75% memory usage
memoryHighWatermarkRatio: 0.8 # Eviction starts above 80% memory usage
# Disk watermarks
diskLowWatermarkRatio: 0.75 # Eviction stops below 75% disk usage
diskHighWatermarkRatio: 0.8 # Eviction starts above 80% disk usage
Parameter |
Type |
Range |
Description |
Recommended use case |
|---|---|---|---|---|
|
float |
(0.0, 1.0] |
Memory usage level where eviction stops. |
Start at |
|
float |
(0.0, 1.0] |
Memory usage level where async eviction starts. |
Start at |
|
float |
(0.0, 1.0] |
Disk usage level where eviction stops. |
Start at |
|
float |
(0.0, 1.0] |
Disk usage level where async eviction starts. |
Start at |
Best practices:
Do not set high or low watermarks above ~0.80 to leave headroom for QueryNode static usage and query-time bursts.
Avoid large gaps between high and low watermarks; big gaps prolong each eviction cycle and can add latency.
Configure cache TTL
Cache Time-to-Live (TTL) automatically removes cached data after a set duration, even if resource thresholds are not reached. It works alongside LRU eviction to prevent stale data from occupying cache indefinitely.
Cache TTL requires backgroundEvictionEnabled: true, as it runs on the same background thread.
Example YAML:
queryNode:
segcore:
tieredStorage:
evictionEnabled: true
backgroundEvictionEnabled: true
# Set the cache expiration time to 604,800 seconds (7 days),
# and expired caches will be cleaned up by a background thread.
cacheTtl: 604800
Parameter |
Type |
Unit |
Description |
Recommended use case |
|---|---|---|---|---|
|
integer |
seconds |
Duration before cached data expires. Expired items are removed in the background. |
Use a short TTL (hours) for highly dynamic data; use a long TTL (days) for stable datasets. Set 0 to disable time-based expiration. |