🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

What is the three-phase commit protocol?

The three-phase commit (3PC) protocol is a distributed consensus algorithm designed to ensure all nodes in a system agree on committing or aborting a transaction, even in the presence of partial failures. It extends the two-phase commit (2PC) protocol by adding a third phase to reduce the risk of indefinite blocking when the coordinator fails. The three phases are CanCommit, PreCommit, and DoCommit. In CanCommit, the coordinator asks participants if they can commit; if all agree, the coordinator proceeds. In PreCommit, the coordinator instructs participants to prepare to commit, and they acknowledge. Finally, in DoCommit, the coordinator confirms the commit, and participants finalize it. This structure allows the system to recover from coordinator failures without leaving participants stuck in an uncertain state.

Each phase serves a specific purpose. During CanCommit, the coordinator checks if all participants are ready to commit. For example, in a distributed database update, each node verifies resource availability and locks data. If any node votes “no,” the transaction aborts immediately. If all vote “yes,” the coordinator moves to PreCommit, where participants persist enough information to commit or roll back. This step ensures that even if a failure occurs, the system can recover. In DoCommit, the coordinator sends the final commit command, and participants apply changes. If the coordinator fails after PreCommit, participants can autonomously decide to commit after a timeout, as they know all nodes previously agreed. This avoids the “blocking” problem in 2PC, where participants wait indefinitely for a failed coordinator.

While 3PC improves fault tolerance, it introduces complexity and latency. For instance, systems using 3PC must handle additional network round trips and manage timeouts for each phase. It’s often used in scenarios requiring high consistency, such as financial systems coordinating transfers across multiple banks. However, modern systems frequently opt for alternatives like Paxos or Raft due to 3PC’s overhead and remaining vulnerabilities (e.g., network partitions). Despite these trade-offs, 3PC remains a foundational example of how adding phases can enhance fault tolerance in distributed transactions.

Like the article? Spread the word