Raft is a consensus algorithm that lets a cluster of servers agree on a sequence of commands (a replicated log) by electing a leader to replicate those commands, so they all see the same order of operations even if some servers fail. Compared to 2PC, Raft focuses on data consistency among data replicas.
Introduction of Raft
A typical scenario for a cluster running Raft is that each server contains a consensus module, a log (replicated). Unlike Byzantine, the failure cases of Raft only include delayed/lost messages and stop-on-failure, and do not include subjective malice.
Leader-less or Leader-based Consensus
General approaches for consensus protocol include leader-less (symmetric) and leader-based (asymmetric).
For leader-less frameworks,
- all servers have equal roles and
- external clients can contact any server.
For leader-based frameworks,
- at any given time, only 1 server is in charge while other servers accept its decisions
- external clients communicate with the leader
Raft uses a leader-based approach. We’ll dive into Raft.

Roles in Raft
At any given time, each server is either
- leader that handles all client connections and log replication,
- candidate that is used to elect a new leader,
- follower that does not issue RPC and only responds to only RPCs.
Role Transittion
The state diagram of role transittion can be characterized as
graph TB A[Follower] B[Candidate] C[Leader] D((start)) --> A A -->|timeout, start election| B B -->|timeout, new election| B B -->|receive majority votes| C B -->|step down, if a new leader is elected| A C -->|discover a server with higher term| A
What is "Term"?
Time is divided into terms, where each term is either election period or normal operation period.
It’s guaranteed that there’s only 1 leader in each term. Some terms may not have a leader (like a failed election period).
It’s worth noting that each server should maintain current term value. Change of term value means information update, thus the role of “term” is to identify obsolete information.
Log Replication
Log is an ordered sequence of entries on each server. Each entry contains
- a command to execute
- the term number when the command is created
- an index where the command is positioned.
Logs are stored on stable storage to disk. Entries are committed if these entries are known to be stored on majority of servers.
Heartbeat
Servers start as followers. Followers expect to receive RPCs from leaders or candidates, and leaders must send heartbeats to maintain authority.
If electionTimeout elapses with no RPCs, follower assumes leader has crashed and starts a new election.
Election Basics
If a election is started, follower first increment current term, change to candidate state, vote for self, and send RequestVote RPCs to all other servers. Retry until either:
- if receive votes from majority of servers:
- this server becomes the leader
- send
AppendEntriesheartbeats to all other servers
- if receive RPC from valid leader
- then this server returns to normal follower state
- if no one wins the election,
- then this server increments the term value again and starts another new election.
Safety. This procedure allows at most one winner per term. Each server gives out only one vote per term, which is persisted on disk.
Liveness. This procedure ensures that some candidate must eventually win. If the election timesout randomly in when broadcast time , then it usually works well.
Normal Operation
- External clients send commands to leader
- leader appends command to it log
- leader sends
AppendEntriesRPCs to followers - Once new entry committed:
- leader passes command to its state machine, returns result to client
- leader notifies followers of committed entries in subsequent
AppendEntriesRPCs. - followers pass committed commands to their state machine.
If there’re crashed or slow followers, leader retries RPCs until RPCs succeed.
Safety Requirement
Once a log entry has been applied to a state machine, no other state machine should apply a different value for that log entry.
If a leader has decided that a log entry is committed, that entry will be present in the logs of all future leaders. This guarantees the safety requirement:
- leader never overwrite entries in their logs
- only entries in the leader’s log can be committed.
- entries must be committed before applied to state machine
External Client Protocol
External client sends commands to leader, if leader is unknown then contact any server. If the contacted server is not leader, the contacted server will redirect it to leader.
The leader does not respond until command has beem logged -> commited -> executed by leader’s state machine.
If request times out (e.g. leader crash),
- client re-issues command to some other server
- eventually redirected to new leader
- retry request with new leader
What if leader crashes after executing command, but before responding? Since we must not execute command twice, client embeds a unique ID in each command. The server includes this ID in log entry. Before accepting command, leader check its log for entry with that ID. If ID is found in log, ignore new command and return response from old command.
Raft Configuration
System config of Raft must be safe too, e.g., including but not limited to
- the number of servers should be an odd number
- ID & address for each server
- determines what constituts a majority
And consensus mechanism must support changes in the configuration, so as to replace failed machine and change degree of replication. In Raft, system config change is treated as an operation that must reach consensus in the replicated log.