Tower BFT

In the previous article, we explored Proof of History, Solana's implementation of a network-wide universal clock mechanism in the form of an unbreakable chain of "tick" hashes the whole network coordinates around. In this article, we will delve into Tower BFT, Solana's Proof-of-Stake consensus mechanism.

Although they work in tandem, their purpose within the network is different; Proof of History serves as a way to measure elapsed time on the network, while also allowing validators to know how executed transactions should be ordered. The purpose of Tower BFT is to decide which blocks get confirmed and also when they become finalized and canonical, recorded on the blockchain forever. Proof of History says "tick tock", Tower BFT says "this block is final, lock it in" (this sentence had to be here, sorry).

Introduction

The idea of Tower BFT is heavily based on pBFT (practical Byzantine Fault Tolerance), a replication algorithm invented in 1999 - a decade before the first Bitcoin block was mined. PBFT was the first implementation of the Byzantine-quorum logic described in the Byzantine Generals Problem* paper (1982) that was fast enough for real-world applications.

*the smallest unit of value on the Solana network (0.000000001 SOL) is called a lamport as a nod to one of the paper's original authors, Leslie Lamport.

In pBFT, nodes reach the final decision after 3 stages of voting, heavily relying on time-outs (basically deadlines for something to happen). Those are dependent on the clock that's running on the node hardware, which might be slightly different on each node and is not synchronized - this is where the need for multiple voting rounds in order to ensure network safety comes from.

Tower BFT can reach confirmation after just one round of voting because the Proof of History hash chain provides a synchronized clock for each node - this massively reduces consensus traffic on the network compared to pBFT (the amount of vote messages that are necessary for reaching a decision goes down from n² to n where n is the number of active nodes) and improves processing speed as a direct result.

Tower BFT isn't the first Byzantine-fault-tolerant consensus algorithm used in a production blockchain, but it is the first to successfully implement single-round voting. Earlier algorithms just adopted the 3-stage design from pBFT for use in a blockchain application (worth mentioning as an example is the Tendermint BFT consensus mechanism from 2014, used by Cosmos and BNB Chain).

Let's now have a look at how Tower BFT works in practice.

Voting for Block Confirmation

As you already know, Solana measures time in Proof of History hashes called ticks - 64 ticks make up one block slot. Active validators send one vote per slot. In theory, validators can cast multiple votes per one slot, but that would be considered equivocation and the validator would risk punishment for this behavior. A validator can choose not to vote on a slot, usually this happens when they don't receive enough block data from the leader in time. In order to receive staking rewards, a validator has to vote on at least 80% of all slots during one epoch (432.000 slots = 2 days or so).

But! There's a catch. Every vote does NOT have the same voting power, each validator's voting power is proportional to the amount of SOL they have actively staked. This is called stake-weighted voting.

To reach consensus on a vote, you need the supermajority - on Solana, you need at least ⅔ of the total effective staked SOL to confirm each block (the ⅔ number originally comes from the Byzantine Generals Problem paper and its use in this context is common).

Votes are standard Solana transactions that travel across the network the same way regular transactions do (in fact, vote transactions make up around 75% of Solana's traffic) - when the ledger shows that ⅔+ of the stake-weighted votes have agreed to add a block to the blockchain, the block is confirmed, but not finalized. Under standard conditions, a block is confirmed within the next block slot.

If a block doesn’t receive ⅔ of the stake’s votes immediately, it remains unconfirmed. However, it can still gather enough votes in subsequent slots to reach confirmation retroactively. In such cases, the chain keeps producing unconfirmed slots until the supermajority returns. Apps that rely on the "confirmed" status to process user transactions might stop their services (from their users' points of view) until nodes reach consensus on unconfirmed blocks because of this.

Should ⅓+ of stake go offline or support a different fork, confirmation halts - past outages have shown this - but with the largest validator on Solana currently holding only around 5 % of stake (it's Binance), a deliberate ⅓-stake attack is fairly unrealistic. Let's now have a look at how blocks reach finality.

Building the Tower to Reach Block Finality

In Tower BFT, each validator maintains a vote tower (this is where the name for Tower BFT came from). The tower is simply a stack of the validator's previous votes, where each vote has an associated lockout counter attached to it.

A lockout is a time period measured in slots during which a validator cannot vote for a conflicting fork of a block they have already cast a vote for. The lockout starts at 1 slot for the first vote and doubles with every consecutive vote on the same fork: 1, 2, 4, 8, up to a maximum of 32 slots.

Each new vote confirms the current block slot and also extends the time window during which the validator is prohibited from voting for a conflicting fork - if the validator voted for a conflicting fork during this period, it would create an on-chain proof of equivocation and could potentially lead to punishment (as of right now, slashing* is not fully active on Solana yet, but it's already proposed and partially implemented).

*slashing is a mechanism in Proof-of-Stake blockchains, its purpose is to punish malicious nodes within the network - if a node keeps voting for conflicting forks, is offline on purpose or similar, a portion or the entirety of their stake could be confiscated by the network (=slashed).

But since validators have to lock their votes in for a certain fork for 32 slots, the main purpose of the vote tower is to create an economic incentive to vote honestly - validators will receive full rewards for their participation in the network only after their vote is sitting 32 slots deep. Voting for a conflicting fork during the lockout period invalidates the current vote tower, and in addition to the fact that rewards won’t be earned for those slots, it may also expose the validator to slashing (once enabled).

The 32-slot time period is considered to be the time necessary to reach block finality - after 32 slots, the cost to reverse a block becomes so high that no rational validator would risk it. In practice, block confirmation can be reached in 400-600 ms, and block finality in about 12-15 seconds after confirmation. Most dApps usually consider block confirmation to be good enough for their services, block finality is useful in cases where e.g. exchanges need to move a large amount of funds - the funds aren't considered moved until after finality is reached for safety reasons.

Let's now have a look at how block data travels within the Solana network. We will explore Turbine, Solana's block data propagation protocol that plays an important role in the speed of the network as well as a couple of other things closely related to it.

PreviousProof of History

Last updated 13 minutes ago