Home
Blog
How Milvus Balances Query Load across Nodes?

How Milvus Balances Query Load across Nodes?

Engineering

February 28, 2022

Xi Ge

Binlog Cover Image

In previous blog articles, we have successively introduced the Deletion, Bitset, and Compaction functions in Milvus 2.0. To culminate this series, we would like to share the design behind Load Balance, a vital function in the distributed cluster of Milvus.

Implementation

Whereas the number and size of segments buffered in query nodes differ, the search performance across the query nodes may also vary. The worst case could happen when a few query nodes are exhausted searching on a large amount of data, but newly created query nodes remain idle because no segment is distributed to them, causing a massive waste of CPU resources and a huge drop in search performance.

To avoid such circumstances, the query coordinator (query coord) is programmed to distribute segments evenly to each query node according to the RAM usage of the nodes. Therefore, CPU resources are consumed equally across the nodes, thereby significantly improving search performance.

Trigger automatic load balance

According to the default value of the configuration queryCoord.balanceIntervalSeconds, the query coord checks the RAM usage (in percentage) of all query nodes every 60 seconds. If either of the following conditions is satisfied, the query coord starts to balance the query load across the query node:

RAM usage of any query node in the cluster is larger than queryCoord.overloadedMemoryThresholdPercentage (default: 90);
Or the absolute value of any two query nodes’ RAM usage difference is larger than queryCoord.memoryUsageMaxDifferencePercentage (default: 30).

After the segments are transferred from the source query node to the destination query node, they should also satisfy both the following conditions:

RAM usage of the destination query node is no larger than queryCoord.overloadedMemoryThresholdPercentage (default: 90);
The absolute value of the source and destination query nodes’ RAM usage difference after load balancing is less than that before load balancing.

With the above conditions satisfied, the query coord proceeds to balance the query load across the nodes.

Load balance

When load balance is triggered, the query coord first loads the target segment(s) to the destination query node. Both query nodes return search results from the target segment(s) at any search request at this point to guarantee the completeness of the result.

After the destination query node successfully loads the target segment, the query coord publishes a sealedSegmentChangeInfo to the Query Channel. As shown below, onlineNodeID and onlineSegmentIDs indicate the query node that loads the segment and the segment loaded respectively, and offlineNodeID and offlineSegmentIDs indicate the query node that needs to release the segment and the segment to release respectively.

sealedSegmentChangeInfo

Having received the sealedSegmentChangeInfo, the source query node then releases the target segment.

Load Balance Workflow

The whole process succeeds when the source query node releases the target segment. By completing that, the query load is set balanced across the query nodes, meaning the RAM usage of all query nodes is no larger than queryCoord.overloadedMemoryThresholdPercentage, and the absolute value of the source and destination query nodes’ RAM usage difference after load balancing is less than that before load balancing.

What’s next?

In the 2.0 new feature series blog, we aim to explain the design of the new features. Read more in this blog series!

This is the finale of the Milvus 2.0 new feature blog series. Following this series, we are planning a new series of Milvus Deep Dive, which introduces the basic architecture of Milvus 2.0. Please stay tuned.