Switchover
Switchover changes the primary-standby direction without data loss. Use it when the current primary cluster is still reachable, or when you need to move traffic for maintenance.
This guide assumes the current topology is:
cluster-a (primary) -> cluster-b (standby)
After switchover, the topology becomes:
cluster-b (primary) -> cluster-a (standby)
When to Use Switchover
Use switchover when:
- You are doing maintenance on the current primary.
- The primary is partially degraded but can still respond to requests.
- You need RPO = 0 and cannot accept data loss.
Do not use switchover if the primary is completely unavailable. In that case, use Failover.
Before You Begin
Check the following before starting:
- Both clusters are reachable.
- CDC replication is healthy.
- CDC lag is low enough for your recovery time target.
- Application writes can be paused or retried during the role change.
- You have prepared the new topology configuration.
Switchover guarantees no data loss, but the operation time depends on how much data remains to be replicated.
Build the New Topology
Create a full replacement configuration where cluster-b becomes the source and cluster-a becomes the target.
# If you followed Set Up CDC Replication, cluster A is the original source cluster,
# and cluster B is the original target cluster.
cluster_a_id = source_cluster_id
cluster_a_addr = source_cluster_addr
cluster_a_client_addr = source_client_addr
cluster_a_token = source_cluster_token
cluster_a_pchannels = source_cluster_pchannels
cluster_b_id = target_cluster_id
cluster_b_addr = target_cluster_addr
cluster_b_client_addr = target_client_addr
cluster_b_token = target_cluster_token
cluster_b_pchannels = target_cluster_pchannels
switchover_config = {
"clusters": [
{
"cluster_id": cluster_a_id,
"connection_param": {
"uri": cluster_a_addr,
"token": cluster_a_token,
},
"pchannels": cluster_a_pchannels,
},
{
"cluster_id": cluster_b_id,
"connection_param": {
"uri": cluster_b_addr,
"token": cluster_b_token,
},
"pchannels": cluster_b_pchannels,
},
],
"cross_cluster_topology": [
{
"source_cluster_id": cluster_b_id,
"target_cluster_id": cluster_a_id,
}
],
}
Apply the New Topology
Apply the same configuration to both clusters. Send the request to the current primary first, and then send it to the standby. If you later switch back, reverse the order because cluster-b is the current primary.
from pymilvus import MilvusClient
client_a = MilvusClient(uri=cluster_a_client_addr, token=cluster_a_token)
client_b = MilvusClient(uri=cluster_b_client_addr, token=cluster_b_token)
try:
client_a.update_replicate_configuration(**switchover_config)
client_b.update_replicate_configuration(**switchover_config)
finally:
client_a.close()
client_b.close()
The old primary demotes to standby and rejects new writes. The old standby waits for remaining replicated data, promotes itself to primary, and then accepts writes.
If the request fails because of a transient network or service error, retry with the same configuration.
Redirect Application Traffic
After cluster-b becomes primary:
- Point write traffic to
cluster-b. - Confirm reads and writes succeed on
cluster-b. - Confirm
cluster-ais no longer receiving application writes. - Keep monitoring replication from
cluster-bback tocluster-a.
Verify the Result
Verify that cluster-b is serving as the new primary and that data remains consistent. Common checks include:
- Compare row counts for important collections.
- Query known primary keys from both clusters.
- Run a representative search on the new primary and old standby.
- Run a small write on
cluster-band confirm it is replicated tocluster-a.
Switch Back
To switch back later, apply the original topology again:
cluster-a -> cluster-b
Use the same switchover flow. Make sure the current primary is reachable and replication is healthy before switching back.
FAQ
Does switchover lose data?
No. Switchover waits for remaining data to be replicated before the standby becomes primary.
Do I need to stop application writes?
You should pause writes or make writes retryable during the role change. Writes sent to the old primary after it demotes are rejected.
Why does switchover take longer than expected?
The most common reason is CDC lag. The new primary must receive remaining data before it can safely take over with RPO = 0.
Can I retry a failed switchover request?
Yes. Retry with the same target topology.
What happens to the old primary?
The old primary becomes a standby. It should no longer receive application writes.