Combining Paxos and PaxosLease

Marton Trencseni - Mon 06 April 2026 - Distributed

Introduction

In the previous articles, I built the two halves of a useful distributed system separately:

First, there was Paxos: nodes agree on a value for each round, and if you treat those values as commands, you get a replicated log. Then I added durability, so acceptors and learners could crash, restart, and still remember what they had already promised, accepted, or learned. Separately, I also built PaxosLease: a way for a node to temporarily become the leader, or more precisely, the lease owner.

This article combines the two. The resulting system is still small, still toy-sized, and still full of shortcuts. But conceptually it is a big step forward. We now have a replicated state machine with a master. Paxos still decides what goes into the log. PaxosLease decides who is currently allowed to drive that process. The code for this article is on Github.

The main change

The biggest behavioral change is simple: only the current master accepts client writes. If a client sends a command to a follower, that node refuses the request and tells the client who it believes the master is. In this setup, clients must discover the master and then connect to that node directly. Non-master nodes do not proxy writes.

That logic is right in the /command endpoint:

@app.route("/command", methods=["POST"])
def endpoint_command():
    data = request.get_json(force=True, silent=True) or {}
    if "command" not in data:
        return jsonify({"error": "Missing 'command' in JSON body"}), 400

    if not paxoslease_proposer.is_master():
        master = find_master()
        return jsonify({
            "status": "not_master",
            "reason": "This node is not the master; send the command to the master node.",
            "master": master,
        })

    round_id = get_current_round()
    result = proposer.paxos_round(round_id, data["command"])
    if result.get("status") == "success":
        advance_round(round_id + 1)
    return jsonify(result)

Plain Paxos is happy to let multiple proposers compete. That is part of what makes it robust, but it also makes the common case noisier than it needs to be. Once we add leases, we get a much nicer operational model: normally there is one active master, and that master is the only node proposing new commands. Paxos is still underneath, but the front door is now a master-based system.

The three pieces

The combined system is spread across three files.

paxos.py contains the consensus machinery: acceptors, learners, proposers, durability, and the log replay logic. The learner persists chosen values round by round, and on restart it rebuilds the in-memory database by replaying the learned commands in order.

paxoslease.py contains the lease protocol: acceptors that remember lease promises and accepted leases, and proposers that acquire, extend, and release leases. A successful proposer becomes the current master for a bounded amount of time.

node.py is the glue. It wires the two protocols together, exports HTTP endpoints, tracks the current Paxos round, runs a background catch-up thread, and runs a background lease-acquisition loop that keeps trying to become master when nobody else is.

That separation is nice because the architecture stays easy to state:

Paxos decides the replicated log.
PaxosLease elects a temporary master.
The master appends commands to the log.

Replicated log

Even though this article is about combining Paxos and PaxosLease, the actual state machine is still the same old trick: each chosen Paxos value is a command string, and learning that value means persisting it and executing it against a local Python dictionary. On restart, the learner reloads the persisted rounds and replays them into a fresh in-memory database.

That is obviously not how you would build a serious production system. Using exec as the state machine interface is hilariously unsafe. But as a teaching device, it works very well. The whole point is that a replicated state machine is just:

a sequence of agreed commands
applied in the same order everywhere

Catch-up

Since nodes can fall behind, the combined node keeps a background catch-up loop. It periodically polls peers for their current Paxos round, and if some peer is ahead, it fetches the missing rounds one by one and feeds them into the learner. After that, it advances its own current_round.

The code is straightforward:

def try_catchup():
    while True:
        time.sleep(1.0)
        local_round = get_current_round()
        for peer in peers:
            if peer.endswith(str(port)):
                continue
            try:
                r = requests.get(f"{peer}/paxos/current", timeout=1.0)
                if r.status_code != 200:
                    continue
                peer_round = r.json().get("round_id", 0)
            except Exception:
                continue

            if peer_round > local_round:
                for rid in range(local_round, peer_round):
                    try:
                        resp = requests.get(
                            f"{peer}/paxos/fetch",
                            params={"round_id": rid},
                            timeout=1.0
                        )
                        if resp.status_code != 200:
                            continue
                        data = resp.json()
                        if not data.get("success"):
                            continue
                        value = data.get("value")
                        learner.learn(rid, value)
                    except Exception:
                        continue
                advance_round(peer_round)
                local_round = peer_round

This is not new in spirit, but it becomes more important once leases enter the picture. A future master should not be stale. If leadership can move around, lagging nodes need a path to catch up before they start driving the log again.

PaxosLease for master election

The lease layer now has a very clear job: decide who the master is. A node tries to acquire a lease from a majority of PaxosLease acceptors. If successful, it becomes the current master for LEASE_SECONDS. While the lease is live, it keeps renewing it. If it fails to renew, or if the lease expires, it is no longer master.

The renewal logic is one of the most practical parts of the implementation. When a node acquires a lease, it starts one timer for local expiry and another timer that attempts an extension halfway through the lease:

def _start_local_lease_timer(self, lease_seconds):
    self._cancel_local_lease_timer()
    with self._lock:
        self.state.lease_expires_at = time.time() + lease_seconds
        expires_at = self.state.lease_expires_at

    self._lease_timer = threading.Timer(
        lease_seconds,
        self._on_local_lease_timeout
    )
    self._lease_timer.daemon = True
    self._lease_timer.start()

    self._cancel_extend_timer()
    remaining = expires_at - time.time()
    if remaining > 0:
        extend_after = remaining / 2.0
        self._extend_timer = threading.Timer(
            extend_after,
            self._on_extend_timer
        )
        self._extend_timer.daemon = True
        self._extend_timer.start()

This gives the system a familiar shape: in the healthy case, one node quietly remains master by repeatedly extending its lease. If that node dies or gets partitioned badly enough that it cannot renew, the lease expires and somebody else can take over.

Tie the lease to the Paxos round

The most interesting change in the combined code is not that there is a lease. We already had that. The interesting part is that the lease protocol is now tied to Paxos progress.

Each node tracks a current_round, meaning the next log slot it wants to propose into. Whenever the node advances that round, it also updates the PaxosLease acceptor’s local view of the round. That synchronization happens here:

def advance_round(new_round):
    global current_round
    with round_lock:
        if new_round > current_round:
            current_round = new_round
    # keep PaxosLease acceptor's local round in sync
    paxoslease_acceptor.set_local_round_id(get_current_round())

And on the PaxosLease side, lease messages now carry a round_id, and acceptors reject lease messages whose round is stale:

def _check_round_id(self, msg_round_id):
    if msg_round_id is None:
        return False
    return msg_round_id >= self.state.local_round_id

def on_prepare(self, proposal_id, msg_round_id):
    with self._lock:
        if not self._check_round_id(msg_round_id):
            return False, self.state, "stale_round_id"
        if self.state.promised_n is None or proposal_id > self.state.promised_n:
            self.state.promised_n = proposal_id
            success = True
        else:
            success = False
        return success, self.state, None

The same stale-round check also appears in on_propose() and on_release(). Without some coupling between the lease layer and the consensus layer, an out-of-date node could become master too easily. In this combined design, the lease protocol is no longer floating above Paxos as a separate abstraction. It is constrained by the node’s view of the replicated log. That is a useful bit of hygiene: before a node takes over, it should at least not be obviously behind.

The client path

Once you know there is a master, the client path is simple. A client sends a command to some node. If that node is not master, it refuses the request and returns the node it believes currently owns the lease. The client then has to connect to that node directly. There is no proxying here, and no stable front-end router. This is intentionally simple, but it also means leader discovery is a client concern in this toy system.

The master discovery helper is best-effort:

def find_master():
    # Best-effort: ask peers who *they* think is master; pick a live one with a non-expired lease.
    now = time.time()
    for peer in peers:
        try:
            r = requests.get(f"{peer}/paxoslease/status", timeout=1.0)
            if r.status_code != 200:
                continue
            data = r.json()
            ps = (data.get("proposer_state") or {})
            if (ps.get("lease_owner")
                and ps.get("lease_expires_at") is not None
                and ps.get("lease_expires_at") > now):
                return peer
        except Exception:
            continue
    return None

If the client does reach the master, the rest is just a normal Paxos round for the current slot. The master increments its proposal number, runs phase 1 prepare, runs phase 2 propose, and then broadcasts the learned value. If successful, node.py advances current_round by one.

This is the key thing to notice: even though the system now looks master-based from the outside, under the hood each client write still goes through a full Paxos round.

Startup behavior

The node first reloads durable Paxos state, replays the log into the local database, and sets the next round accordingly. Then, before attempting to become master, it waits for one full lease interval.

The startup logic looks like this:

acceptor.load_persisted()
next_round = learner.load_persisted_and_replay_db()
advance_round(next_round)

if __name__ == "__main__":
    # Respect PaxosLease startup rule: wait LEASE_SECONDS before attempting to acquire.
    print(f"Node {node_id} starting, waiting for {LEASE_SECONDS} seconds to respect PaxosLease protocol...")
    time.sleep(LEASE_SECONDS)

That sleep is there for a good reason. If a node restarts, it should not immediately assume there is no valid lease in the system. Waiting out the maximum lease duration is the conservative move. This is one of those classic distributed systems details: correctness often comes from respecting time windows you would rather ignore.

What this design buys us

This combined design is much easier to reason about than unconstrained multi-proposer Paxos. In the normal case, there is one active master. Clients send writes to that master. The master drives the next log slot. Followers still participate in consensus and still learn the log, but they are not competing to serve writes. The lease layer gives us failover. The durable log gives us recovery. The catch-up path gives lagging nodes a way back into the cluster.

It also has a pleasing separation of concerns:

Paxos decides what the cluster agreed on.
PaxosLease decides who is currently allowed to lead.
node.py turns that into a master-based replicated state machine.

What is still simplified

The command format is raw Python executed with exec. Client-side master discovery is best-effort. Non-master nodes refuse writes instead of forwarding them. There is no read protocol here, so I am not discussing linearizable reads. There is no batching, no snapshots, no log compaction, and no careful treatment of longer-term membership or reconfiguration. Lease state is timer-driven and simple.

Most importantly, this is still “one full Paxos round per command”. That is the right place to stop for this article, because it sets up the next one perfectly.

Conclusion

Once you have a master-based system, the obvious inefficiency jumps out at you. The master is stable for a while, but each command still does both phases of Paxos: prepare and propose. In other words, even in the healthy steady state, each new log slot still pays for a full two-phase round.

In the next step, the master will stop re-running the full prepare phase for every slot in the steady state. Once leadership is established, the system can usually go straight to the propose step for new log entries. In a master-based setup, that reduces the steady-state round-trip cost from two message delays to one.