Untitled

The Stellar Consensus Protocol: A Federated Model for Internet-level Consensus DAVID MAZI`ERES, Stellar Development Foundation
ThispaperintroducesanewmodelforconsensuscalledfederatedByzantineagreement(FBA).FBAachieves robustness through quorum slices—individual trust decisions made by each node that together determine system-levelquorums.Slicesbindthesystemtogethermuchthewayindividualnetworks’peeringandtransit decisions now unify the Internet. We also present the Stellar Consensus Protocol (SCP), a construction for FBA. Like all Byzantine agreementprotocols,SCPmakesnoassumptionsabouttherationalbehaviorofattackers.UnlikepriorByzantine agreement models, which presuppose a unanimously accepted membership list, SCP enjoys open membership that promotes organic network growth. Compared to decentralized proof of-work and proof-of-stake schemes, SCP has modest computing and ﬁnancial requirements, lowering the barrier to entry and potentially opening up ﬁnancial systems to new participants. CCS Concepts:•Security and privacy → Distributed systems security; Security protocols; Additional Key Words and Phrases: Byzantine fault tolerance, asynchronous systems
1. INTRODUCTION Financial infrastructure is currently a mess of closed systems. Gaps between these systems mean that transaction costs are high [Provost 2013] and money moves slowly across political and geographic boundaries [Banning-Lover 2015; CGAP 2008]. This friction has curtailed the growth of ﬁnancial services, leaving billions of people underserved ﬁnancially [Demirguc-Kunt et al. 2015]. To solve these problems, we need ﬁnancial infrastructure that supports the kind of organic growth and innovation we’ve seen from the Internet, yet still ensures the integrityofﬁnancialtransactions.Historically,wehavereliedonhighbarrierstoentryto ensureintegrity.Wetrustestablishedﬁnancialinstitutionsanddoourbesttoregulate them. But this exclusivity conﬂicts with the goal of organic growth. Growth demands new, innovative participants, who may possess only modest ﬁnancial and computing resources. We need a worldwide ﬁnancial network open to anyone, so that new organizations can join and extend ﬁnancial access to unserved communities. The challenge for such a network is ensuring participants record transactions correctly. With a low barrier to entry, users won’t trust providers to police themselves. With worldwide reach, providers won’t all trust a single entity to operate the network. A compelling alternative is a decentralized system in which participants together ensure integrity by agreeing on the validity of one another’s transactions. Such agreement hinges on a mechanism for worldwide consensus. This paper presents federated Byzantine agreement (FBA), a model suitable for worldwide consensus. In FBA, each participant knows of others it considers important. It waits for the vast majority of those others to agree on any transaction before considering the transaction settled. In turn, those important participants do not agree to the transaction until the participants they consider important agree as well, and so on. Eventually, enough of the network accepts a transaction that it becomes infeasible for an attacker to roll it back. Only then do any participants consider the transaction settled. FBA’s consensus can ensure the integrity of a ﬁnancial network. Its decentralized control can spur organic growth. This paper further presents the Stellar consensus protocol (SCP), a construction for FBA. We prove that SCP’s safety is optimal for an asynchronous protocol, in that it guarantees agreement under any node-failure scenario that admits such a guarantee.
Draft of February 25, 2016
2 D. Mazi`eres
We also show that SCP is free from blocked states—in which consensus is no longer possible—unless participant failures make it impossible to satisfy trust dependencies. SCPistheﬁrstprovablysafeconsensusmechanismtoenjoyfourkeypropertiessimultaneously: — Decentralized control. Anyone is able to participate and no central authority dictates whose approval is required for consensus. — Low latency. In practice, nodes can reach consensus at timescales humans expect for web or payment transactions—i.e., a few seconds at most. — Flexible trust. Users have the freedom to trust any combination of parties they see ﬁt. For example, a small non-proﬁt may play a key role in keeping much larger institutions honest. — Asymptotic security. Safety rests on digital signatures and hash families whose parameters can realistically be tuned to protect against adversaries with unimaginably vast computing power. SCP has applications beyond ﬁnancial markets for ensuring organizations perform importantfunctionshonestly.Anexampleiscertiﬁcateauthorities(CAs),wholiterally holdthekeystotheweb.ExperienceshowsthatCAssignincorrectcertiﬁcatesthatget used in the wild [Microsoft 2013; Langley 2015]. Several proposals address this problem through certiﬁcate transparency [Kim et al. 2013; Laurie et al. 2013; Basin et al. 2014;Melaraetal.2014].Certiﬁcatetransparencyallowsuserstoexaminethehistory of certiﬁcates issued for any given entity and detect attempts by CAs to change an entity’s public key without the endorsement of the previous key. SCP holds the potential to strengthen the indelible certiﬁcate history at the core of certiﬁcate transparency. Demanding global consensus on certiﬁcate history among a decentralized group of auditors would make it harder to backpedal and override previously issued certiﬁcates. The next section discusses previous approaches to consensus. Section 3 deﬁnes federated Byzantine agreement (FBA) and lays out notions of safety and liveness applicable in the FBA model. Section 4 discusses optimal failure resilience in an FBA system, thereby establishing the security goals for SCP. Section 5 develops federated voting, a key building block of the SCP protocol. Section 6 presents SCP itself, proving safetyandfreedomfromblockedstates.Section7discusseslimitationsofSCP.Finally, Section 8 summarizes results. For readers less familiar with mathematical notation, Appendix A deﬁnes some symbols used throughout the paper.
2. RELATED WORK Figure1summarizeshowSCPdiffersfrompreviousconsensusmechanisms.Themost famous decentralized consensus mechanism is the proof-of-work scheme advanced by Bitcoin [Nakamoto 2008]. Bitcoin takes a two-pronged approach to consensus. First, it provides incentives for rational actors to behave well. Second, it settles transactions through a proof-of-work [Dwork and Naor 1992] algorithm designed to protect against ill-behaved actors who do not possess the majority of the system’s computing power. Bitcoinhasoverwhelminglydemonstratedtheappealofdecentralizedconsensus[Bonneau et al. 2015]. Proof of work has limitations, however. First, it wastes resources: by one estimate from 2014, Bitcoin might consume as much electric power as the entire country of Ireland [O’Dwyer and Malone 2014]. Second, secure transaction settlement suffers from expected latencies in the minutes or tens of minutes [Karame et al. 2012]. Finally, in contrast to traditional cryptographic protocols, proof of work offers no asymptotic security. Given non-rational attackers—or ones with extrinsic incentives to sabotage
The Stellar Consensus Protocol 3
mechanism
decentralized control
low latency
ﬂexible trust
asymptotic security
proof of work ! proof of stake ! maybe maybe Byzantine agreement ! ! ! Tendermint ! ! ! SCP (this work) ! ! ! !
Fig. 1. Properties of different consensus mechanisms
consensus—small computational advantages can invalidate the security assumption, allowing history to be re-written in so-called “51% attacks.” Worse, attackers initially controlling less than 50% of computation can game the system to provide disproportionate rewards for those who join them [Eyal and Sirer 2013], thereby potentially gaining majority control. As the leading digital currency backed by the most computational power, Bitcoin enjoys a measure of protection against 51% attacks. Smaller systemshavefallenvictim[crazyearner2013;Bradbury2013],however,posingaproblem for any proof-of-work system not built on the Bitcoin block chain. An alternative to proof of work is proof of stake [King and Nadal 2012], in which consensus depends on parties that have posted collateral. Like proof of work, rewards encourage rational participants to obey the protocol; some designs additionally penalize bad behavior [Buterin 2014; Davarpanah et al. 2015]. Proof of stake opens the possibility of so-called “nothing at stake” attacks, in which parties that previously posted collateral but later cashed it in and spent the money can go back and rewrite history from a point where they still had stake. To mitigate such attacks, systems effectively combine proof of stake with proof of work—scaling down the required work in proportion to stake—or delay refunding collateral long enough for some other (sometimes informal) consensus mechanism to establish an irreversible checkpoint. StillanotherapproachtoconsensusisByzantineagreement[Peaseetal.1980;Lamport et al. 1982], the best known variant of which is PBFT [Castro and Liskov 1999]. Byzantineagreementensuresconsensusdespitearbitrary(includingnon-rational)behavior on the part of some fraction of participants. This approach has two appealing properties. First, consensus can be fast and efﬁcient. Second, trust is entirely decoupled from resource ownership, which makes it possible for a small non-proﬁt to help keep more powerful organizations, such as banks or CAs, honest. Complicating matters, however, all parties must agree on the the exact list of participants. Moreover, attackers must be prevented from joining multiple times and exceeding the system’s failure tolerance, a so-called Sybil attack [Douceur 2002]. BFT-CUP [Alchieri et al. 2008] accommodates unknown participants, but still presupposes a Sybil-proof centralized admission-control mechanism. Generally,membershipinByzantineagreementsystemsissetbyacentralauthority or closed negotiation. Prior attempts to decentralize admission have given up some of the beneﬁts. One approach, taken by Ripple, is to publish a “starter” membership list thatparticipantscaneditforthemselves,hopingpeople’seditsareeitherinconsequentialorreproducedbyanoverwhelmingfractionofparticipants.Unfortunately,because divergent lists invalidate safety guarantees [Schwartz et al. 2014], users are reluctant to edit the list in practice and a great deal of power ends up concentrated in the maintainer of the starter list. Another approach, taken by Tendermint [Kwon 2014], is to basemembershiponproofofstake.However,doingsoonceagaintiestrusttoresource
4 D. Mazi`eres
ownership.SCPistheﬁrstByzantineagreementprotocoltogiveeachparticipantmaximum freedom in chosing which combinations of other participants to trust.
3. FEDERATED BYZANTINE AGREEMENT SYSTEMS This section introduces the federated Byzantine agreement (FBA) model. Like nonfederated Byzantine agreement, FBA addresses the problem of updating replicated state, such as a transaction ledger or certiﬁcate tree. By agreeing on what updates to apply, nodes avoid contradictory, irreconcilable states. We identify each update by a unique slot from which inter-update dependencies can be inferred. For instance, slots may be consecutively numbered positions in a sequentially applied log. An FBA system runs a consensus protocol that ensures nodes agree on slot contents. A node 𝑣 can safely apply update 𝑥 in slot 𝑖 when it has safely applied updates in all slotsuponwhich𝑖dependsand,additionally,itbelievesallcorrectlyfunctioningnodes will eventually agree on 𝑥 for slot 𝑖. At this point, we say 𝑣 has externalized 𝑥 for slot 𝑖. The outside world may react to externalized values in irreversible ways, so a node cannot later change its mind about them. A challenge for FBA is that malicious parties can join many times and outnumber honest nodes. Hence, traditional majority-based quorums do not work. Instead, FBA determines quorums in a decentralized way, by each node selecting what we call quorumslices.Thenextsubsectiondeﬁnesquorumsbasedonslices.Thefollowingsubsection provides some examples and discussion. Finally, we deﬁne the key properties of safety and liveness that a consensus protocol should hope to achieve.
3.1. Quorum slices In a consensus protocol, nodes exchange messages asserting statements about slots. We assume such assertions cannot be forged, which can be guaranteed if nodes are named by public key and they digitally sign messages. When a node hears a sufﬁcient set of nodes assert a statement, it assumes no functioning node will ever contradict that statement. We call such a sufﬁcient set a quorum slice, or, more concisely, just a slice. To permit progress in the face of node failures, a node may have multiple slices, any one of which is sufﬁcient to convince it of a statement. At a high level, then, an FBA system consists of a loose confederation of nodes each of which has chosen one or more slices. More formally: Deﬁnition (FBAS). A federated Byzantine agreement system, or FBAS, is a pair ⟨𝐕,𝐐⟩comprising a set of nodes 𝐕 and a quorum function 𝐐 ∶ 𝐕 → 22𝐕 ⧵ {∅} specifying one or more quorum slices for each node, where a node belongs to all of its own quorum slices—i.e., ∀𝑣 ∈ 𝐕,∀𝑞 ∈ 𝐐(𝑣),𝑣 ∈ 𝑞. (Note 2𝑋 denotes the powerset of 𝑋.) Deﬁnition (quorum). A set of nodes 𝑈 ⊆ 𝐕 in FBAS⟨𝐕,𝐐⟩is a quorum iff 𝑈 ≠∅ and𝑈 contains a slice for each member—i.e., ∀𝑣 ∈ 𝑈,∃𝑞 ∈ 𝐐(𝑣) such that 𝑞 ⊆ 𝑈. A quorum is a set of nodes sufﬁcient to reach agreement. A quorum slice is the subset of a quorum convincing one particular node of agreement. A quorum slice may besmallerthanaquorum.Considerthefour-nodesysteminFigure2,whereeachnode has a single slice and arrows point to the other members of that slice. Node 𝑣1’s slice {𝑣1,𝑣2,𝑣3} is sufﬁcient to convince 𝑣1 of a statement. But 𝑣2’s and 𝑣3’s slices include 𝑣4, meaning neither 𝑣2 nor 𝑣3 can assert a statement without 𝑣4’s agreement. Hence, no agreement is possible without 𝑣4’s participation, and the only quorum including 𝑣1 is the set of all nodes {𝑣1,𝑣2,𝑣3,𝑣4}. Traditional, non-federated Byzantine agreement requires all nodes to accept the same slices, meaning ∀𝑣1,𝑣2,𝐐(𝑣1) = 𝐐(𝑣2). Because every member accepts every slice, traditional systems do not distinguish between slices and quorums. The downside is
The Stellar Consensus Protocol 5
𝑣1
𝑣2 𝑣3
𝑣4
𝐐(𝑣1) = {{𝑣1,𝑣2,𝑣3}} 𝐐(𝑣2) = 𝐐(𝑣3) = 𝐐(𝑣4) = {{𝑣2,𝑣3,𝑣4}}
Fig. 2. 𝑣1’s quorum slice is not a quorum without 𝑣4.
𝑣1 𝑣2 𝑣3 𝑣4 Top tier: slice is 3 out of {𝑣1,𝑣2,𝑣3,𝑣4}, including self
𝑣5 𝑣6 𝑣7 𝑣8 Middle tier: slice is self + any 2 top tier nodes
𝑣9 𝑣10 Leaf tier: slice is self + any 2 middle tier nodes
2/4
2/4
3/4
Fig. 3. Tiered quorum structure example
thatmembershipandquorumsmustsomehowbepre-ordained,precludingopenmembership and decentralized control. A traditional system, such as PBFT [Castro and Liskov1999],typicallyhas3𝑓+1nodes,any2𝑓+1ofwhichconstituteaquorum.Here 𝑓 is the maximum number of Byzantine failures—meaning nodes acting arbitrarily— the system can survive. FBA, introduced by this paper, generalizes Byzantine agreement to accommodate a greaterrangeofsettings.FBA’skeyinnovationisenablingeachnode𝑣tochoseitsown quorumsliceset𝐐(𝑣).System-widequorumsthusarisefromindividualdecisionsmade by each node. Nodes may select slices based on arbitrary criteria such as reputation or ﬁnancial arrangements. In some settings, no individual node may have complete knowledge of all nodes in the system, yet consensus should still be possible.
3.2. Examples and discussion Figure 3 shows an example of a tiered system in which different nodes have different slice sets, something possible only with FBA. A top tier, comprising 𝑣1,…,𝑣4, is structured like a PBFT system with 𝑓 = 1, meaning it can tolerate one Byzantine failure so long as the other three nodes are reachable and well-behaved. Nodes 𝑣5,…,𝑣8 constitute a middle tier and depend not on each other, but rather on the top tier. Only two top tier nodes are required to form a slice for a middle tier node. (The top tier assumes at most one Byzantine failure, so two top tier nodes cannot both fail unless the whole system has failed.) Nodes 𝑣9 and 𝑣10 are in a leaf tier for which a slice consists of any
6 D. Mazi`eres
𝑣1
𝑣2
𝑣3
𝑣4
𝑣5
𝑣6
𝐐(𝑣𝑖) ={{𝑣𝑖,𝑣(𝑖 mod 6)+1}}
Fig. 4. Cyclic quorum structure example
two middle tier nodes. Note that𝑣9 and𝑣10 may pick disjoint slices such as{𝑣5,𝑣6}and {𝑣7,𝑣8}; nonetheless, both will indirectly depend on the top tier. In practice, the top tier could consist of anywhere from four to dozens of widely known and trusted ﬁnancial institutions. As the size of the top tier grows, there may not be exact agreement on its membership, but there will be signiﬁcant overlap between most parties’ notions of top tier. Additionally, one can imagine multiple middle tiers, for instance one for each country or geographic region. This tiered structure resembles inter-domain network routing. The Internet today is held together by individual peering and transit relationships between pairs of networks. No central authority dictates or arbitrates these arrangements. Yet these pairwise relationships have sufﬁced to create a notion of de facto tier one ISPs [Norton 2010]. Though Internet reachability does suffer from ﬁrewalls, transitive reachability is nearly complete—e.g., a ﬁrewall might block The New York Times, but if it allows Google, and Google can reach The New York Times, then The New York Times is transitively reachable. Transitive reachability may be of limited utility for web sites, but it iscrucialforconsensus;theequivalentexamplewouldbeGoogleacceptingstatements only if The New York Times does. If we think of quorum slices as analogous to network reachability and quorums as analogoustotransitivereachability,thentheInternet’snearcompletetransitivereachabilitysuggestswecanlikewiseensureworldwideconsensuswithFBA.Inmanyways, consensus is an easier problem than inter-domain routing. While transit consumes resources and costs money, slice inclusion merely requires checking digital signatures. Hence,FBAnodescanerronthesideofinclusiveness,constructingconservativeslices withgreaterinterdependenceandredundancythantypicallyseeninpeeringandtransit arrangements. Anotherexamplenotpossiblewithcentralizedconsensusiscyclicdependencystructures, such as the one depicted in Figure 4. Such a cycle is unlikely to arise intentionally, but when individual nodes choose their own slices, it is possible for the overall system to end up embedding dependency cycles. The bigger point is that, compared to traditional Byzantine agreement, an FBA protocol must cope with a far wider variety of quorum structures.
3.3. Safety and liveness Wecategorizenodesaseitherwell-behavedorill-behaved.Awell-behavednodechooses sensiblequorumslices(discussedfurtherinSection4.1)andobeystheprotocol,including eventually responding to all requests. An ill-behaved node does not. Ill-behaved nodes suffer Byzantine failure, meaning they behave arbitrarily. For instance, an ill
The Stellar Consensus Protocol 7
Byzantine, including crashed
ill-behaved well-behaved
blocked divergent correct
failed correct
Fig. 5. Venn diagram of node failures behaved node may be compromised, its owner may have maliciously modiﬁed the software, or it may have crashed. The goal of Byzantine agreement is to ensure that well-behaved nodes externalize the same values despite the presence of such ill-behaved nodes. There are two parts to this goal. First, we would like to prevent nodes from diverging and externalizing different values for the same slot. Second, we would like to ensure nodes can actually externalize values, as opposed to getting blocked in some dead-end state from which consensus is no longer possible. We introduce the following two terms for these properties: Deﬁnition (safety). A set of nodes in an FBAS enjoy safety if no two of them ever externalize different values for the same slot. Deﬁnition (liveness). A node in an FBAS enjoys liveness if it can externalize new values without the participation of any failed (including ill-behaved) nodes. We call well-behaved nodes that enjoy both safety and liveness correct. Nodes that are not correct have failed. All ill-behaved nodes have failed, but a well-behaved node can fail, too, by waiting indeﬁnitely for messages from ill-behaved nodes, or, worse, by having its state poisoned by incorrect messages from ill-behaved nodes. Figure 5 illustrates the possible kinds of node failure. To the left are Byzantine failures, meaning the ill-behaved nodes. To the right are two kinds of well-behaved but failed nodes. Nodes that lack liveness are termed blocked, while those that lack safety are termed divergent. An attack violating safety is strictly more powerful than one violating only liveness, so we classify divergent nodes as a subset of blocked ones. Our deﬁnition of liveness is weak in that it says a node can externalize new values, not that it will. Hence, it admits a state of perpetual preemption in which consensus remains forever possible, yet the network continually thwarts it by delaying or reordering critical messages in just the wrong way. Perpetual preemption is inevitable in a purely asynchronous, deterministic system that survives node failure [Fischer et al. 1985]. Fortunately, preemption is transient. It does not indicate node failure, because the system can recover at any time. Protocols can mitigate the problem through randomness [Ben-Or 1983; Bracha and Toueg 1985] or through realistic assumptions about message latency [Dwork et al. 1988]. Latency assumptions are more practical when one would like to limit execution time or avoid the trusted dealers often required by more efﬁcient Randomized algorithms [?]. Of course, only termination and not safety should depend upon message timing.
4. OPTIMAL RESILIENCE Whether or not nodes enjoy safety and liveness depends on several factors: what quorum slices they have chosen, which nodes are ill-behaved, and of course the concrete consensus protocol and network behavior. As is common for asynchronous systems, we assume the network eventually delivers messages between well-behaved nodes, but can otherwise arbitrarily delay or reorder messages.
8 D. Mazi`eres
𝑣2
𝑣1
𝑣3
𝐐(𝑣1) = 𝐐(𝑣2) = 𝐐(𝑣3) = {{𝑣1,𝑣2,𝑣3}} 𝑣5
𝑣4
𝑣6
𝐐(𝑣4) = 𝐐(𝑣5) = 𝐐(𝑣6) = {{𝑣4,𝑣5,𝑣6}}
Fig. 6. FBAS lacking quorum intersection
𝑣2
𝑣1
𝑣3
𝐐(𝑣1) = 𝐐(𝑣2) = 𝐐(𝑣3) = {{𝑣1,𝑣2,𝑣3,𝑣7}} 𝑣5
𝑣4
𝑣6
𝐐(𝑣4) = 𝐐(𝑣5) = 𝐐(𝑣6) = {{𝑣4,𝑣5,𝑣6,𝑣7}}
𝑣7 𝐐(𝑣7) = {{𝑣7}}
Fig. 7. Ill-behaved node 𝑣7 can undermine quorum intersection. This section answers the following question: given a speciﬁc⟨𝐕,𝐐⟩and particular subset of 𝐕 that is ill-behaved, what are the best safety and liveness that any federated Byzantine agreement protocol can guarantee regardless of the network? We ﬁrst discuss quorum intersection, a property without which safety is impossible to guarantee. We then introduce a notion of dispensable sets—sets of failed nodes in spite of which it is possible to guarantee both safety and liveness.
4.1. Quorum intersection A protocol can guarantee agreement only if the quorum slices represented by function 𝐐 satisfy a validity property we call quorum intersection. Deﬁnition (quorum intersection). AnFBASenjoysquorumintersectioniffanytwoof its quorums share a node—i.e., for all quorums 𝑈1 and 𝑈2, 𝑈1 ∩𝑈2≠∅. Figure 6 illustrates a system lacking quorum intersection, where 𝐐 permits two quorums, {𝑣1,𝑣2,𝑣3} and {𝑣4,𝑣5,𝑣6}, that do not intersect. Disjoint quorums can independentlyagreeoncontradictorystatements,underminingsystem-wideagreement.When many quorums exist, quorum intersection fails if any two do not intersect. For example,thesetofallnodes{𝑣1,…,𝑣6}inFigure6isaquorumthatintersectstheothertwo, but the system still lacks quorum intersection because the other two do not intersect each other. No protocol can guarantee safety in the absence of quorum intersection, since such a conﬁguration can operate as two different FBAS systems that do not exchange any messages. However, even with quorum intersection, safety may be impossible to guarantee in the presence of ill-behaved nodes. Compare Figure 6, in which there are two disjoint quorums, to Figure 7, in which two quorums intersect at a single node 𝑣7, and 𝑣7 is ill-behaved. If 𝑣7 makes inconsistent statements to the left and right quorums, the effect is equivalent to disjoint quorums. In fact, since ill-behaved nodes contribute nothing to safety, no protocol can guarantee safety without the well-behaved nodes enjoying quorum intersection on their own. After all, in a worst-case scenario for safety, ill-behaved nodes can just always make any possible (contradictory) statement that completes a quorum. Two quorums overlapping only at ill-behaved nodes will again be able to operate like two different FBAS
The Stellar Consensus Protocol 9 systems thanks to the duplicity of the ill-behaved nodes. In short, FBAS⟨𝐕,𝐐⟩can survive Byzantine failure by a set of nodes 𝐵 ⊆ 𝐕 iff⟨𝐕,𝐐⟩enjoys quorum intersection after deleting the nodes in 𝐵 from 𝐕 and from all slices in 𝐐. More formally: Deﬁnition (delete). If⟨𝐕,𝐐⟩is an FBAS and 𝐵 ⊆ 𝐕 is a set of nodes, then to delete 𝐵 from⟨𝐕,𝐐⟩, written⟨𝐕,𝐐⟩𝐵, means to compute the modiﬁed FBAS⟨𝐕⧵𝐵,𝐐𝐵⟩where 𝐐𝐵(𝑣) = {𝑞 ⧵𝐵 ∣ 𝑞 ∈ 𝐐(𝑣)}. It is the responsibility of each node 𝑣 to ensure 𝐐(𝑣) does not violate quorum intersection. One way to do so is to pick conservative slices that lead to large quorums. Of course, a malicious 𝑣 may intentionally pick 𝐐(𝑣) to violate quorum intersection. But a malicious 𝑣 can also lie about the value of 𝐐(𝑣) or ignore 𝐐(𝑣) to make arbitrary assertions. In short, 𝐐(𝑣)’s value is not meaningful when 𝑣 is ill-behaved. This is why the necessary property for safety—quorum intersection of well-behaved nodes after deleting ill-behaved nodes—is unaffected by the slices of ill-behaved nodes. SupposeFigure6evolvedfromathree-nodeFBAS𝑣1,𝑣2,𝑣3 withquorumintersection to a six-node FBAS without. When 𝑣4,𝑣5,𝑣6 join, they maliciously choose slices that violate quorum intersection and no protocol can guarantee safety for 𝐕. Fortunately, deleting the bad nodes to yield⟨𝐕,𝐐⟩{𝑣4,𝑣5,𝑣6} restores quorum intersection, meaning at least {𝑣1,𝑣2,𝑣3} can enjoy safety. Note that deletion is conceptual, for the sake of describingoptimalsafety.Aprotocolshouldguaranteesafetyfor𝑣1,𝑣2,𝑣3 withouttheir needing to know that 𝑣4,𝑣5,𝑣6 are ill-behaved. 4.2. Dispensable sets (DSets) We capture the fault tolerance of nodes’ slice selections through the notion of a dispensible set or DSet. Informally, the safety and liveness of nodes outside a DSet can be guaranteedregardlessofthebehaviorofnodesinsidetheDSet.Putanotherway,inan optimally resilient FBAS, if a single DSet encompasses every ill-behaved node, it also contains every failed node, and conversely all nodes outside the DSet are correct. As an example, in a centralized PBFT system with 3𝑓 +1 nodes and quorum size 2𝑓 +1, any𝑓 or fewer nodes constitute a DSet. Since PBFT in fact survives up to𝑓 Byzantine failures, its robustness is optimal. In the less regular example of Figure 3, {𝑣1} is a DSet, since one top tier node can fail without affecting the rest of the system. {𝑣9} is also a DSet because no other node dependson𝑣9 forcorrectness.{𝑣6,…,𝑣10}isaDSet,becauseneither𝑣5 northetoptier depend on any of those ﬁve nodes. {𝑣5,𝑣6} is not a DSet, as it is a slice for 𝑣9 and 𝑣10 and hence, if entirely malicious, can lie to 𝑣9 and 𝑣10 and convince them of assertions inconsistent with each other or the rest of the system. To prevent a misbehaving DSet from affecting the correctness of other nodes, two properties must hold. For safety, deleting the DSet cannot undermine quorum intersection. For liveness, the DSet cannot deny other nodes a functioning quorum. This leads to the following deﬁnition: Deﬁnition (DSet). Let⟨𝐕,𝐐⟩be an FBAS and 𝐵 ⊆ 𝐕 be a set of nodes. We say 𝐵 is a dispensible set, or DSet, iff: (1) (quorum intersection despite 𝐵) ⟨𝐕,𝐐⟩𝐵 enjoys quorum intersection, and (2) (quorum availability despite 𝐵) Either 𝐕⧵𝐵 is a quorum in⟨𝐕,𝐐⟩or 𝐵 = 𝐕. Quorum availability despite 𝐵 protects against nodes in 𝐵 refusing to answer requests and blocking other nodes’ progress. Quorum intersection despite 𝐵 protects against the opposite—nodes in 𝐵 making contradictory assertions that enable other nodes to externalize inconsistent values for the same slot. Nodes must balance the two threats in slice selection. All else equal, bigger slices lead to bigger quorums with
10 D. Mazi`eres
well-behaved / ill-behaved
Local property of nodes, independent of other nodes (except for the validity of slice selection).
intact / befouled
Property of nodes given their quorum slices and a particular set of ill-behaved nodes. Befouled nodes are ill-behaved or depend, possibly indirectly, on too many ill-behaved nodes.
correct / failed
Property of nodes given their quorum slices, a concrete protocol, and actual network behavior. The goal of a consensus protocol is to guarantee correctness for all intact nodes.
Fig. 8. Key properties of FBAS nodes greateroverlap,meaningfewerfailednodesets𝐵 willunderminequorumintersection when deleted. On the other hand, bigger slices are more likely to contain failed nodes, endangering quorum availability. The smallest DSet containing all ill-behaved nodes may encompass well-behaved nodes as well, reﬂecting the fact that a sufﬁciently large set of ill-behaved nodes can cause well-behaved nodes to fail. For instance, in Figure 3, the smallest DSet containing 𝑣5 and 𝑣6 is {𝑣5,𝑣6,𝑣9,𝑣10}. The set of all nodes, 𝐕, is always a DSet, as an FBAS ⟨𝐕,𝐐⟩vacuouslyenjoysquorumintersectiondespite𝐕and,byspecialcase,alsoenjoys quorum availability despite 𝐕. The motivation for the special case is that given sufﬁciently many ill-behaved nodes, 𝐕 may be the smallest DSet to contain all ill-behaved ones, indicating a scenario under which no protocol can guarantee anything better than complete system failure. The DSets in an FBAS are determined a priori by the quorum function 𝐐. Which nodesarewell-andill-behaveddependsonruntimebehavior,suchasmachinesgetting compromised.TheDSetswecareaboutarethosethatencompassallill-behavednodes, as they help us distinguish nodes that should be guaranteed correct from ones for which such a guarantee is impossible. To this end, we introduce the following terms: Deﬁnition (intact). Anode𝑣inanFBASisintactiffthereexistsaDSet𝐵 containing all ill-behaved nodes and such that 𝑣 ∉ 𝐵. Deﬁnition (befouled). A node 𝑣 in an FBAS is befouled iff it is not intact. A befouled node 𝑣 is surrounded by enough failed nodes to block its progress or poisonitsstate,evenif𝑣itselfiswell-behaved.NoFBAScanguaranteethecorrectnessof abefoulednode.However,anoptimalFBASguaranteesthateveryintactnoderemains correct. Figure 8 summarizes the key properties of nodes. The following theorems facilitateanalysisbyshowingthatthesetofbefoulednodesisalwaysaDSetinanFBAS with quorum intersection. THEOREM 1. Let 𝑈 be a quorum in FBAS⟨𝐕,𝐐⟩, let 𝐵 ⊆ 𝐕 be a set of nodes, and let𝑈 ′ = 𝑈 ⧵𝐵. If 𝑈′≠∅ then 𝑈′ is a quorum in⟨𝐕,𝐐⟩𝐵. PROOF. Because 𝑈 is a quorum, every node 𝑣 ∈ 𝑈 has a 𝑞 ∈ 𝐐(𝑣) such that 𝑞 ⊆ 𝑈. Since𝑈′ ⊆ 𝑈,itfollowsthatevery𝑣 ∈ 𝑈′ hasa𝑞 ∈ 𝐐(𝑣)suchthat𝑞⧵𝐵 ⊆ 𝑈′.Rewriting with deletion notation yields ∀𝑣 ∈ 𝑈′,∃𝑞 ∈ 𝐐𝐵(𝑣) such that 𝑞 ⊆ 𝑈′, which, because 𝑈′ ⊆ 𝐕⧵𝐵, means that 𝑈′ is a quorum in⟨𝐕,𝐐⟩𝐵. THEOREM 2. If 𝐵1 and 𝐵2 are DSets in an FBAS⟨𝐕,𝐐⟩enjoying quorum intersection, then 𝐵 = 𝐵1 ∩𝐵2 is a DSet, too. PROOF. Let 𝑈1 = 𝐕⧵𝐵1 and 𝑈2 = 𝐕⧵𝐵2. If 𝑈1 = ∅, then 𝐵1 = 𝐕 and 𝐵 = 𝐵2 (a DSet), so we are done. Similarly, if 𝑈2 = ∅, then 𝐵 = 𝐵1, and we are done. Otherwise, note
The Stellar Consensus Protocol 11 that by quorum availability despite DSets𝐵1 and𝐵2,𝑈1 and𝑈2 are quorums in⟨𝐕,𝐐⟩. It follows from the deﬁnition that the union of two quorums is also a quorum. Hence 𝐕⧵𝐵 = 𝑈1 ∪𝑈2 is a quorum and we have quorum availability despite 𝐵. We must now show quorum intersection despite 𝐵. Let 𝑈𝑎 and 𝑈𝑏 be any two quorums in⟨𝐕,𝐐⟩𝐵. Let 𝑈 = 𝑈1 ∩ 𝑈2 = 𝑈2 ⧵ 𝐵1. By quorum intersection of⟨𝐕,𝐐⟩, 𝑈 = 𝑈1 ∩𝑈2 ≠∅. But then by Theorem 1, 𝑈 = 𝑈2 ⧵𝐵1 must be a quorum in⟨𝐕,𝐐⟩𝐵1. Now consider that 𝑈𝑎 ⧵𝐵1 and 𝑈𝑎 ⧵𝐵2 cannot both be empty, or else 𝑈𝑎 ⧵𝐵 = 𝑈𝑎 would be.Hence,byTheorem1,either𝑈𝑎⧵𝐵1 isaquorumin(⟨𝐕,𝐐⟩𝐵)𝐵1 =⟨𝐕,𝐐⟩𝐵1,or𝑈𝑎⧵𝐵2 isaquorumin(⟨𝐕,𝐐⟩𝐵)𝐵2 =⟨𝐕,𝐐⟩𝐵2,orboth.Intheformercase,notethatif𝑈𝑎⧵𝐵1 is a quorum in⟨𝐕,𝐐⟩𝐵1, then by quorum intersection of⟨𝐕,𝐐⟩𝐵1, (𝑈𝑎 ⧵𝐵1)∩𝑈 ≠∅; since (𝑈𝑎 ⧵𝐵1)∩𝑈 = (𝑈𝑎 ⧵𝐵1)⧵𝐵2, it follows that 𝑈𝑎 ⧵𝐵2 ≠∅, making 𝑈𝑎 ⧵𝐵2 a quorum in ⟨𝐕,𝐐⟩𝐵2. By a similar argument, 𝑈𝑏 ⧵𝐵2 must be a quorum in⟨𝐕,𝐐⟩𝐵2. But then quorum intersection despite 𝐵2 tells us that (𝑈𝑎 ⧵𝐵2)∩(𝑈𝑏 ⧵𝐵2)≠∅, which is only possible if 𝑈𝑎 ∩𝑈𝑏≠∅.
THEOREM 3. In an FBAS with quorum intersection, the set of befouled nodes is a DSet.
PROOF. Let 𝐵min be the intersection of every DSet that contains all ill-behaved nodes. It follows from the deﬁnition of intact that a node 𝑣 is intact iff 𝑣 ∉ 𝐵min. Thus, 𝐵min is precisely the set of befouled nodes. By Theorem 2, DSets are closed under intersection, so 𝐵min is also a DSet.
5. FEDERATED VOTING This section develops a federated voting technique that FBAS nodes can use to agree on a statement. At a high level, the process for agreeing on some statement 𝑎 involves nodes exchanging two sets of messages. First, nodes vote for 𝑎. Then, if the vote was successful, nodes conﬁrm 𝑎, effectively holding a second vote on the fact that the ﬁrst vote succeeded. From each node’s perspective, the two rounds of messages divide agreement on a statement 𝑎 into three phases: unknown, accepted, and conﬁrmed. (This pattern dates backtothree-phasecommit[SkeenandStonebraker1983].)Initially,𝑎’sstatusiscompletelyunknowntoanode𝑣—𝑎couldenduptrue,false,orevenstuckinapermanently indeterminate state. If the ﬁrst vote succeeds, 𝑣 may come to accept 𝑎. No two intact nodeseveracceptcontradictorystatements,soif𝑣isintactandaccepts𝑎,then𝑎cannot be false. For two reasons, however, 𝑣 accepting 𝑎 does not sufﬁce for 𝑣 to act on 𝑎. First, the fact that 𝑣 accepted 𝑎 does not mean all intact nodes can; 𝑎 could be stuck for other nodes. Second, if 𝑣 is befouled, then accepting 𝑎 means nothing—𝑎 may be false at intact nodes. Yet even if 𝑣 is befouled—which 𝑣 does not know—the system may still enjoy quorum intersection of well-behaved nodes, in which case, for optimal safety, 𝑣 needs greater assurance of 𝑎. Holding a second vote addresses both problems. If the second vote succeeds, 𝑣 moves to the conﬁrmed phase in which it can ﬁnally deem 𝑎 true and act on it. The next few subsections detail the federated voting process. Because voting does not rule out the possibility of stuck statements, Section 5.6 discusses how to cope with them. Section 6 will turn federated voting into a consensus protocol that avoids the possibility of stuck slots for intact nodes.
12 D. Mazi`eres
5.1. Voting with open membership A correct node in a Byzantine agreement system acts on a statement 𝑎 only when it knows that other correct nodes will never agree to statements contradicting 𝑎. Most protocols employ voting for this purpose. Well-behaved nodes vote for a statement 𝑎 only if it is valid. Well-behaved nodes also never change their votes. Hence, in centralized Byzantine agreement, it is safe to accept 𝑎 if a quorum comprising a majority of well-behaved nodes has voted for it. We say a statement is ratiﬁed once it has received the necessary votes. Inafederatedsetting,wemustadaptvotingtoaccommodateopenmembership.One difference is that a quorum no longer corresponds to a majority of well-behaved nodes. However, the majority requirement primarily serves to ensure quorum intersection of well-behaved nodes, which Section 4.1 already adapted to FBA. Another implication of open membership is that nodes must discover what constitutes a quorum as part of the voting process. To implement quorum discovery, a protocol should specify 𝐐(𝑣) in all messages from 𝑣. Deﬁnition (vote). A node 𝑣 votes for an (abstract) statement 𝑎 iff (1) 𝑣 asserts 𝑎 is valid and consistent with all statements 𝑣 has accepted, and (2) 𝑣 asserts it has never voted against 𝑎—i.e., voted for a statement that contradicts 𝑎—and 𝑣 promises never to vote against 𝑎 in the future. Deﬁnition (ratify). A quorum 𝑈𝑎 ratiﬁes a statement 𝑎 iff every member of 𝑈𝑎 votes for 𝑎. A node 𝑣 ratiﬁes 𝑎 iff 𝑣 is a member of a quorum 𝑈𝑎 that ratiﬁes 𝑎. THEOREM 4. Two contradictory statements 𝑎 and 𝑎̄ cannot both be ratiﬁed in an FBAS that enjoys quorum intersection and contains no ill-behaved nodes. PROOF. By contradiction. Suppose quorum 𝑈1 ratiﬁes 𝑎 and quorum 𝑈2 ratiﬁes 𝑎̄. By quorum intersection, ∃𝑣 ∈ 𝑈1 ∩ 𝑈2. Such a 𝑣 must have illegally voted for both 𝑎 and 𝑎̄, violating the assumption of no ill-behaved nodes. THEOREM 5. Let⟨𝐕,𝐐⟩be an FBAS enjoying quorum intersection despite 𝐵, and suppose 𝐵 contains all ill-behaved nodes. Let 𝑣1 and 𝑣2 be two nodes not in 𝐵. Let 𝑎 and 𝑎̄ be contradictory statements. If 𝑣1 ratiﬁes 𝑎 then 𝑣2 cannot ratify 𝑎̄. PROOF. By contradiction. Suppose 𝑣1 ratiﬁes 𝑎 and 𝑣2 ratiﬁes 𝑎̄. By deﬁnition, there mustexist aquorum𝑈1 containing𝑣1 thatratiﬁed𝑎andquorum𝑈2 containing𝑣2 that ratiﬁed 𝑎̄. By Theorem 1, since 𝑈1 ⧵ 𝐵 ≠ ∅ and 𝑈2 ⧵ 𝐵 ≠ ∅, both must be quorums in⟨𝐕,𝐐⟩𝐵, meaning they ratiﬁed 𝑎 and 𝑎̄ respectively in⟨𝐕,𝐐⟩𝐵. But⟨𝐕,𝐐⟩𝐵 enjoys quorum intersection and has no ill-behaved nodes, so Theorem 4 tell us 𝑎 and 𝑎̄ cannot both be ratiﬁed. THEOREM 6. Two intact nodes in an FBAS with quorum intersection cannot ratify contradictory statements. PROOF. Let 𝐵 be the set of befouled nodes. By Theorem 3, 𝐵 is a DSet. By the deﬁnition of DSet,⟨𝐕,𝐐⟩enjoys quorum intersection despite 𝐵. By Theorem 5, two nodes not in 𝐵 cannot ratify contradictory statements. 5.2. Blocking sets In centralized consensus, liveness is an all-or-nothing property of the system. Either a unanimously well-behaved quorum exists, or else ill-behaved nodes can prevent the rest of the system from accepting new statements. In FBA, by contrast, liveness may differ across nodes. For instance, in the tiered quorum example of Figure 3, if middle
The Stellar Consensus Protocol 13
𝑣1 vote 𝑎 accept
𝑣2 vote 𝑎 accept
𝑣3 vote 𝑎 accept
𝑣4 vote 𝑎̄
3/4 Slice is 3 nodes, including self
Fig. 9. 𝑣4 voted for 𝑎̄, which contradicts ratiﬁed statement 𝑎. tier nodes 𝑣6,𝑣7,𝑣8 crash, the leaf tier will be blocked while the top tier and node 𝑣5 will continue to enjoy liveness. An FBA protocol can guarantee liveness to a node 𝑣 only if 𝐐(𝑣) contains at least onequorumslicecomprisingonlycorrectnodes.Aset𝐵 offailednodescanviolatethis property if 𝐵 contains at least one member of each of 𝑣’s slices. We term such a set 𝐵 𝑣-blocking, because it has the power to block progress by 𝑣. Deﬁnition (𝑣-blocking). Let𝑣 ∈ 𝐕beanodeinFBAS⟨𝐕,𝐐⟩.Aset𝐵 ⊆ 𝐕is𝑣-blocking iff it overlaps every one of 𝑣’s slices—i.e., ∀𝑞 ∈ 𝐐(𝑣),𝑞∩𝐵≠∅. THEOREM 7. Let 𝐵 ⊆ 𝐕 be a set of nodes in FBAS⟨𝐕,𝐐⟩.⟨𝐕,𝐐⟩enjoys quorum availability despite 𝐵 iff 𝐵 is not 𝑣-blocking for any 𝑣 ∈ 𝐕⧵𝐵. PROOF. “∀𝑣 ∈ 𝐕⧵𝐵,𝐵 is not 𝑣-blocking” is equivalent to “∀𝑣 ∈ 𝐕⧵𝐵,∃𝑞 ∈ 𝐐(𝑣) such that 𝑞 ⊆ 𝐕⧵𝐵.” By the deﬁnition of quorum, the latter holds iff 𝐕⧵𝐵 is a quorum or 𝐵 = 𝐕, the exact deﬁnition of quorum availability despite 𝐵. As a corollary, the DSet of befouled nodes is not 𝑣-blocking for any intact 𝑣.
5.3. Accepting statements When an intact node 𝑣 learns that it has ratiﬁed a statement, Theorem 6 tells 𝑣 that other intact nodes will not ratify contradictory statements. This condition is sufﬁcient for 𝑣 to accept 𝑎, but we cannot make it necessary. Ratifying a statement requires votingforit,andsomenodesmayhavevotedforcontradictorystatements.InFigure9,for example, 𝑣4 votes for 𝑎̄ before learning that the other three nodes ratiﬁed the contradictory statement 𝑎. Though 𝑣4 cannot now vote for 𝑎, we would still like it to accept 𝑎 to be consistent with the other nodes. Akeyinsightisthatifanode𝑣isintact,thenno𝑣-blockingset𝐵 canconsistentirely of befouled nodes. Now suppose 𝐵 is a 𝑣-blocking set and every member of 𝐵 claims to accept statement 𝑎. If 𝑣 is intact, at least one member of 𝐵 must be, too. The intact member will not lie about accepting 𝑎; hence, 𝑎 is true and 𝑣 can accept it. Of course, if 𝑣 is befouled, then 𝑎 might not be true. But a befouled node can accept anything and vacuously not affect the correctness of intact nodes. Deﬁnition (accept). An FBAS node 𝑣 accepts a statement 𝑎 iff it has never accepted a statement contradicting 𝑎 and it determines that either (1) There exists a quorum 𝑈 such that 𝑣 ∈ 𝑈 and each member of 𝑈 either voted for 𝑎 or claims to accept 𝑎, or (2) Each member of a 𝑣-blocking set claims to accept 𝑎. Though a well-behaved node cannot vote for contradictory statements, condition 2 above allows a node to vote for one statement and later accept a contradictory one.
14 D. Mazi`eres
𝑣1 vote 𝑎 accept
𝑣2 vote 𝑎
𝑣4 vote 𝑎̄
𝑣3 vote 𝒂
3/4 Slice is 3 nodes, including self
a)
𝑣2 vote 𝑎
𝑣3 vote ̄ 𝒂 accept
𝑣4 vote 𝑎̄
𝑣1 vote 𝑎 vote ̄ 𝒂
3/4
b)
Fig. 10. Scenarios indistinguishable to 𝑣2 when 𝑣2 does not see bold messages
THEOREM 8. Two intact nodes in an FBAS that enjoys quorum intersection cannot accept contradictory statements. PROOF. Let⟨𝐕,𝐐⟩beanFBASwithquorumintersectionandlet𝐵 beitsDSetofbefoulednodes(whichexistsbyTheorem3).Supposeanintactnodeacceptsstatement𝑎. Let 𝑣 be the ﬁrst intact node to accept 𝑎. At the point 𝑣 accepts 𝑎, only befouled nodes in𝐵 canclaimtoacceptit.SincebythecorollarytoTheorem7,𝐵 cannotbe𝑣-blocking, it must be that 𝑣 accepted 𝑎 through condition 1. Thus, 𝑣 identiﬁed a quorum 𝑈 such thateverynodeclaimedtovotefororaccept𝑎,andsince𝑣istheﬁrstintactnodetoaccept𝑎,itmustmeanallnodesin𝑈⧵𝐵 votedfor𝑎.Inotherwords,𝑣ratiﬁed𝑎in⟨𝐕,𝐐⟩𝐵. Generalizing, any statement accepted by an intact node in⟨𝐕,𝐐⟩must be ratiﬁed in ⟨𝐕,𝐐⟩𝐵. Because 𝐵 is a DSet,⟨𝐕,𝐐⟩𝐵 enjoys quorum intersection. Because additionally 𝐵 contains all ill-behaved nodes, Theorem 4 rules out ratiﬁcation of contradictory statements. 5.4. Accepting is not enough Unfortunately, for nodes to assume the truth of accepted statements would yield suboptimal safety and liveness guarantees in a federated consensus protocol. We discuss the issues with safety and liveness in turn. To provide some context, we then explain why these issues are thornier in FBA than in centralized Byzantine agreement. 5.4.1. Safety. Consider an FBAS ⟨𝐕,𝐐⟩ in which the only quorum is unanimous consent—i.e., ∀𝑣,𝐐(𝑣) = {𝐕}. This ought to be a conservative choice for safety—don’t do anything unless everyone agrees. Yet since every node is 𝑣-blocking for every 𝑣, any node can single-handedly convince any other node to accept arbitrary statements. The problem is that accepted statements are only safe among intact nodes. But as discussed in Section 4.1, the only condition necessary to guarantee safety is quorum intersection of well-behaved nodes, which might hold even in the case that some wellbehavednodesarebefouled.Inparticular,when𝐐(𝑣) = {𝐕},theonlyDSetsare∅and𝐕, meaning any node failure befouls the whole system. By contrast, quorum intersection holds despite every 𝐵 ⊆ 𝐕. 5.4.2. Liveness. Another limitation of accepted statements is that other intact nodes may be unable to accept them. This possibility makes reliance on accepted statements
The Stellar Consensus Protocol 15
problematic for liveness. If a node proceeds to act on a statement because it accepted the statement, other nodes could be unable to proceed in a similar fashion. Consider Figure 10a, in which node 𝑣3 crashes after helping 𝑣1 ratify and accept statement𝑎.Though𝑣1 accepts𝑎,𝑣2 and𝑣4 cannot.Inparticular,from𝑣2’sperspective, thesituationdepictedisindistinguishablefromFigure10b,inwhich𝑣3 votedfor𝑎̄and is well-behaved but slow to respond, while 𝑣1 is ill-behaved and sent 𝑣3 a vote for 𝑎̄ (thereby causing 𝑣3 to accept 𝑎̄) while illegally also sending 𝑣2 a vote for 𝑎. Tosupportaprotocol-levelnotionoflivenessincaseslikeFigure10a,𝑣1 needsaway to ensure every other intact node can eventually accept 𝑎 before 𝑣1 acts on 𝑎. Once this is the case, it makes sense to say the system agrees on 𝑎. Deﬁnition (agree). An FBAS⟨𝐕,𝐐⟩agrees on a statement 𝑎 iff, regardless of what subsequently transpires, once sufﬁcient messages are delivered and processed, every intact node will accept 𝑎. 5.4.3. Comparison to centralized voting. To understand why the above issues arise in federated voting, consider a centralized Byzantine agreement system of 𝑁 nodes with quorum size 𝑇. Such a system enjoys quorum availability with 𝑓𝐿 = 𝑁 − 𝑇 or fewer nodefailures.Sinceanytwoquorumsshareatleast2𝑇−𝑁 nodes,quorumintersection of well-behaved nodes holds up to 𝑓𝑆 = 2𝑇 −𝑁 −1 Byzantine failures. Centralized Byzantine agreement systems typically set 𝑁 = 3𝑓 +1 and 𝑇 = 2𝑓 +1 to yield 𝑓𝐿 = 𝑓𝑆 = 𝑓, the equilibrium point at which safety and liveness have the same fault tolerance. If safety is more important than liveness, some protocols increase 𝑇 so that 𝑓𝑆 > 𝑓𝐿 [Li and Mazi`eres 2007]. In FBA, because quorums arise organically, systems are unlikely to ﬁnd themselves at equilibrium, making it far more important to protect safety in the absence of liveness. Now consider a centralized system in which, because of node failure and contradictory votes, some node 𝑣 cannot ratify statement 𝑎 that was ratiﬁed by other nodes. If 𝑣 hears 𝑓𝑆 +1 nodes claim 𝑎 was ratiﬁed, 𝑣 knows that either one of them is wellbehavedorallsafetyguaranteeshavecollapsed.Eitherway,𝑣canacton𝑎withnoloss of safety. The FBA equivalent would be to hear from a set 𝐵 where 𝐵, if deleted, undermines quorum intersection of well-behaved nodes. Identifying such a 𝐵 is hard for three reasons: one, quorums are discovered dynamically; two, ill-behaved nodes may lie about slices; and three, 𝑣 does not know which nodes are well-behaved. Instead, we deﬁned federated voting to accept 𝑎 when a 𝑣-blocking set does. The 𝑣-blocking property has the advantage of being easily checkable, but is equivalent to hearing from 𝑓𝐿 +1 nodes in a centralized system when we really want 𝑓𝑆 +1. To guarantee agreement among all well-behaved nodes in a centralized system, one merely needs 𝑓𝐿 +𝑓𝑆 +1 nodes to acknowledge that a statement was ratiﬁed. If more than 𝑓𝐿 of them fail, we do not expect liveness anyway. If 𝑓𝐿 or fewer fail, then we know 𝑓𝑆 +1 nodes remain willing to attest to ratiﬁcation, which will in turn convince all other well-behaved nodes. The reliance on 𝑓𝑆 has no easy analogue in the FBA model. Interestingly, however, 𝑓𝐿 +𝑓𝑆 +1 = 𝑇, the quorum size, suggesting a similar approach might work with a more complex justiﬁcation. Put another way, at some point nodes need to believe a statement strongly enough to depend on its truth for safety. A centralized system offers two ways to reach this point for a statement 𝑎: ratify 𝑎 ﬁrst-hand, or reason backwards from 𝑓𝑆 + 1 nodes claiming 𝑎 was ratiﬁed, ﬁguring safety is hopeless if they have all lied. FBA lacks the latter approach; the only tool it has for safety among well-behaved nodes is ﬁrst-hand ratiﬁcation.Sincenodesstillneedawaytoovercomevotesagainstratiﬁedstatements, we introduced a notion of accepting, but it provides a weaker consistency guarantee limited to intact nodes.
16 D. Mazi`eres
5.5. Statement conﬁrmation Both limitations of accepted statements stem from complications when a set of intact nodes 𝑆 votes against a statement 𝑎 that is nonetheless ratiﬁed. Particularly in light of FBA’s non-uniform quorums, 𝑆 may prevent some intact node from ever ratifying 𝑣. Toprovide𝑣ameansofaccepting𝑎despitevotesagainstit,thedeﬁnitionofaccepthas a second criterion based on 𝑣-blocking sets. But the second criterion is weaker than ratiﬁcation, offering no guarantees to befouled nodes that enjoy quorum intersection. Now suppose a statement 𝑎 has the property that no intact node ever votes against it. Then we have no need to accept 𝑎 and can instead insist that nodes directly ratify 𝑎 before acting on it. We call such statements irrefutable. Deﬁnition (irrefutable). Astatement𝑎isirrefutableinanFBASifnointactnodecan ever vote against it. Theorem 8 tells us that two intact nodes cannot accept contradictory statements. Thus, while some intact nodes may vote against a statement 𝑎 that was accepted by an intact node, the statement “an intact node accepted 𝑎” is irrefutable. This suggests holding a second vote to ratify the fact that an intact node accepted 𝑎. Deﬁnition (conﬁrm). A quorum 𝑈𝑎 in an FBAS conﬁrms a statement 𝑎 iff ∀𝑣 ∈ 𝑈𝑎, 𝑣 claims to accept 𝑎. A node conﬁrms 𝑎 iff it is in such a quorum. Nodes express that they have accepted statement 𝑎 by stating “accept(𝑎),” an abbreviation of the statement, “An intact node accepted 𝑎.” To conﬁrm 𝑎 means to ratify accept(𝑎). A well-behaved node 𝑣 can vote for accept(𝑎) only after accepting 𝑎, as 𝑣 cannot assume any particular other nodes are intact. If 𝑣 itself is befouled, accept(𝑎) might be false, in which case voting for it may cost 𝑣 liveness, but a befouled node has no guarantee of liveness anyway. Thenexttheoremshowsthatnodescanrelyonconﬁrmedstatementswithoutlosing optimal safety. Theorem 11 then shows that conﬁrmed statements meet the deﬁnition of agreement from Section 5.4.2, meaning nodes can rely on conﬁrmed statements without endangering the liveness of intact nodes. THEOREM 9. Let⟨𝐕,𝐐⟩be an FBAS enjoying quorum intersection despite 𝐵, and suppose 𝐵 contains all ill-behaved nodes. Let 𝑣1 and 𝑣2 be two nodes not in 𝐵. Let 𝑎 and 𝑎̄ be contradictory statements. If 𝑣1 conﬁrms 𝑎, then 𝑣2 cannot conﬁrm 𝑎̄. PROOF. First note that accept(𝑎) contradicts accept(𝑎̄)—no well-behaved node can vote for both. Note further that 𝑣1 must ratify accept(𝑎) to conﬁrm 𝑎. By Theorem 5, 𝑣2 cannot ratify accept(𝑎̄) and hence cannot conﬁrm 𝑎̄. THEOREM 10. Let 𝐵 be the set of befouled nodes in an FBAS⟨𝐕,𝐐⟩with quorum intersection. Let 𝑈 be a quorum containing an intact node (𝑈 ⊈ 𝐵), and let 𝑆 be any set suchthat𝑈 ⊆ 𝑆 ⊆ 𝐕.Let𝑆+ = 𝑆⧵𝐵 bethesetofintactnodesin𝑆,andlet𝑆− = (𝐕⧵𝑆)⧵𝐵 be the set of intact nodes not in 𝑆. Either 𝑆− = ∅, or ∃𝑣 ∈ 𝑆− such that 𝑆+ is 𝑣-blocking. PROOF. If 𝑆+ is 𝑣-blocking for some 𝑣 ∈ 𝑆−, then we are done. Otherwise, we must show 𝑆− = ∅. If 𝑆+ is not 𝑣-blocking for any 𝑣 ∈ 𝑆−, then, by Theorem 7, either 𝑆− = ∅ or 𝑆− is a quorum in⟨𝐕,𝐐⟩𝐵. In the former case we are done, while in the latter we get a contradiction: By Theorem 1, 𝑈 ⧵𝐵 is a quorum in⟨𝐕,𝐐⟩𝐵. Since 𝐵 is a DSet (by Theorem 3),⟨𝐕,𝐐⟩𝐵 must enjoy quorum intersection, meaning 𝑆−∩(𝑈 ⧵𝐵)≠∅. This is impossible, since (𝑈 ⧵𝐵) ⊆ 𝑆 and 𝑆− ∩𝑆 = ∅. THEOREM 11. If an intact node in an FBAS⟨𝐕,𝐐⟩with quorum intersection conﬁrms a statement 𝑎, then, whatever subsequently transpires, once sufﬁcient messages are delivered and processed, every intact node will accept and conﬁrm 𝑎.
The Stellar Consensus Protocol 17
quorum satisfying 𝑣 each votes or accepts 𝑎
quorum satisfying 𝑣 conﬁrms 𝑎
𝑎 is valid
𝑣-blocking set accepts 𝑎
uncommitted
voted 𝑎 accepted 𝑎 conﬁrmed 𝑎
voted 𝑎̄
Fig. 11. Possible states of an accepted statement 𝑎 at a single node 𝑣
bivalent
𝑎-valent
𝑎̄-valent
stuck
𝑎 agreed
𝑎̄ agreed
Fig. 12. Possible system-wide status of a statement 𝑎 PROOF. Let 𝐵 be the DSet of befouled nodes and let 𝑈 ⊈ 𝐵 be the quorum through whichanintactnodeconﬁrmed𝑎.Letnodesin𝑈⧵𝐵 broadcastaccept(𝑎).Bydeﬁnition, any node 𝑣, regardless of how it has voted, accepts 𝑎 after receiving accept(𝑎) from a 𝑣-blocking set. Hence, these messages may convince additional nodes to accept 𝑎. Let these additional nodes in turn broadcast accept(𝑎) until a point is reached at which, regardless of future communication, no further intact nodes can ever accept 𝑎. At this point let 𝑆 be the set of nodes that claim to accept 𝑎 (where 𝑈 ⊆ 𝑆), let 𝑆+ be the set of intactnodesin𝑆,andlet𝑆− bethesetofintactnodesnotin𝑆.𝑆+ cannotbe𝑣-blocking for any node in 𝑆−, or else more nodes could come to accept 𝑎. By Theorem 10, then, 𝑆− = ∅, meaning every intact node has accepted 𝑎. Figure 11 summarizes the paths an intact node 𝑣 can take to conﬁrm 𝑎. Given no knowledge, 𝑣 might vote for either 𝑎 or the contradictory 𝑎̄. If 𝑣 votes for 𝑎̄, it cannot latervotefor𝑎,butcannonethelessaccept𝑎ifa𝑣-blockingsetacceptsit.Asubsequent quorum of conﬁrmation messages allows 𝑣 to conﬁrm 𝑎, which by Theorem 11 means the system agrees on 𝑎.
5.6. Liveness and neutralization The main challenge of distributed consensus, whether centralized or not, is that a statement can get stuck in a permanently indeterminate state before the system reaches agreement on it. Hence, a protocol must not attempt to ratify externalized values directly. Should the statement “The value of slot 𝑖 is 𝑥” get stuck, the system will be forever unable to agree on slot 𝑖, losing liveness. The solution is to craft the statements in votes carefully. It must be possible to break a stuck statement’s hold on the question we really care about, namely slot contents. We call the process of obsoleting a stuck statement neutralization.
18 D. Mazi`eres
Local state System-wide status of 𝑎 uncommitted unknown (any) voted 𝑎 unknown (any) voted 𝑎̄ unknown (any) accepted 𝑎 stuck, 𝑎-valent, or 𝑎 agreed conﬁrmed 𝑎 𝑎 agreed Fig. 13. What an intact node knows about the status of statement 𝑎 More concretely, Figure 12 depicts the potential status a statement 𝑎 can have system-wide. Initially, the system is bivalent, by which we mean there is one sequence of possible events through which all intact nodes will accept 𝑎, and another sequence through which all intact nodes will reject 𝑎 (i.e., accept a statement 𝑎̄ contradicting 𝑎). At some point, one of these two outcomes may cease to be possible. If no intact node can ever reject 𝑎, we say the system is 𝑎-valent; conversely, if no intact node can ever accept 𝑎, we say the system is 𝑎̄-valent. At the time an FBAS transitions from bivalent to 𝑎-valent, there is a possible outcome in which all intact nodes accept 𝑎. However, this might not remain the case. ConsideraPBFT-likefour-nodesystem{𝑣1,…,𝑣4}inwhichanythreenodesconstitute a quorum. If 𝑣1 and 𝑣2 vote for 𝑎, the system becomes 𝑎-valent; no three nodes can ratify a contradictory statement. However, if 𝑣3 and 𝑣4 subsequently vote for 𝑎̄ contradicting 𝑎, it also becomes impossible to ratify 𝑎. In this case, 𝑎’s state is permanently indeterminate, or stuck. As seen in Figure 10a, even once an intact node accepts 𝑎, the system may still fail to reach system-wide agreement on 𝑎. However, by Theorem 11, once an intact node conﬁrms 𝑎, all intact nodes can eventually come to accept it; hence the system has agreed upon 𝑎. Figure 13 summarizes what intact nodes know about the global state of a statement from their own local state. Topreservethepossibilityofconsensus,aprotocolmustensurethateverystatement is either irrefutable, and hence cannot get stuck, or neutralizable, and hence cannot block progress if stuck. There are two popular approaches to crafting neutralizable statements: the view-based approach, pioneered by viewstamped replication [Oki and Liskov 1988] and favored by PBFT [Castro and Liskov 1999]; and the ballot-based approach, invented by Paxos [Lamport 1998]. The ballot-based approach may be harder to understand [Ongaro and Ousterhout 2014]. Compounding confusion, people often call viewstamped replication “Paxos” or assert that the two algorithms are the same when they are not [van Renesse et al. 2014]. View-based protocols associate the slots in votes with monotonically increasing view numbers. Should consensus get stuck on the 𝑖th slot in view 𝑛, nodes recover by agreeingthatview𝑛hadfewerthan𝑖meaningfulslotsandmovingtoahigherviewnumber. Ballot-based protocols associate the values in votes with monotonically increasing ballot numbers. Should a ballot get stuck, nodes retry the same slot with a higher ballot, taking care never to select values that would contradict prior stuck ballots. Thisworktakesaballot-basedapproach,asdoingsomakesiteasiertodoawaywith the notion of a distinguished primary node or leader. For example, leader behavior can be emulated [Lamport 2011b].
6. SCP: A FEDERATED BYZANTINE AGREEMENT PROTOCOL This section presents the Stellar Consensus Protocol, SCP. At a high level, SCP consists of two sub-protocols: a nomination protocol and a ballot protocol. The nomination
The Stellar Consensus Protocol 19
protocol produces candidate values for a slot. If run long enough, it eventually produces the same set of candidate values at every intact node, which means nodes can combine the candidate values in a deterministic way to produce a single composite value for the slot. There are two huge caveats, however. First, nodes have no way of knowing when the nomination protocol has reached the point of convergence. Second, even after convergence, ill-behaved nodes may be able to reset the nomination process a ﬁnite number of times. When nodes guess that the nomination protocol has converged, they execute the ballot protocol, which employs federated voting to commit and abort ballots associated with composite values. When intact nodes agree to commit a ballot, the value associated with the ballot will be externalized for the slot in question. When they agree to abort a ballot, the ballot’s value becomes irrelevant. If a ballot gets stuck in a state where one or more intact nodes cannot commit or abort it, then nodes try again with a higher ballot; they associate the new ballot with the same value as the stuck one in case any node believes the stuck ballot was committed. Intuitively, safety results from ensuring that all stuck and committed ballots are associated with the same value. Liveness follows from the fact that a stuck ballot can be neutralized by moving to a higher ballot. The remainder of this section presents the nomination and ballot protocols. Each is described ﬁrst in terms of conceptual statements, then as a concrete protocol with messages representing sets of conceptual statements. Finally, Section 6.3 shows the correctness of the protocol. SCP treats each slot completely independently and can be viewed as many separate instances of a single-slot consensus protocol (akin to the “single-decree synod” in Paxos [Lamport 1998]). Concepts such as candidate values and ballots must always be interpreted in the context of a particular slot even if much of the discussion leaves the slot implicit.
6.1. Nomination protocol Because slots need only be partially ordered, some applications of SCP will have only one plausible ballot per slot. For example, in certiﬁcate transparency, each CA may have its own series of slots and sign exactly one certiﬁcate tree per slot. However, other applications admit many plausible values per slot, in which case it is helpful to narrow down the possible input values. Our strategy is to begin with a synchronous nomination protocol that achieves consensus under certain timing assumptions, and feed the output of the nomination protocol into an asynchronous ballot protocol whose safety does not depend on timing [Lamport 2011a]. Such an initial synchronous phase is sometimes called a conciliator [Aspnes 2010]. The nomination protocol works by converging on a set of candidate values for a slot. Nodes then deterministically combine these candidates into a single composite value for the slot. Exactly how to combine values depends on the application. By way of example, the Stellar network uses SCP to choose a set of transactions and a ledger timestamp for each slot. To combine candidate values, Stellar takes the union of their transaction sets and the maximum of their timestamps. (Values with invalid timestamps will not receive enough nominations to become candidates.) Other possible approaches include combining sets by intersection or simply picking the candidate value with the highest hash. Nodes produce a candidate value 𝑥 through federated voting on the statement nominate 𝑥. Deﬁnition (candidate). A node 𝑣 considers a value 𝑥 to be a candidate when 𝑣 has conﬁrmed the statement nominate 𝑥—i.e., 𝑣 has ratiﬁed accept(nominate 𝑥).
20 D. Mazi`eres
So long as node 𝑣 has no candidate values, 𝑣 may vote in favor of nominate 𝑥 for any value 𝑥 that passes application-level validity checks (such as timestamps not being in the future). In fact, 𝑣 should generally re-nominate any values that it sees other nodes nominate, with some rate-limiting discussed below to avoid an explosion of candidates.Assoonas𝑣hasacandidatevalue,however,itmustceasevotingtonominate𝑥 for any new values 𝑥. It should still continue to accept nominate statements for new values (when accepted by a 𝑣-blocking set) and conﬁrm new nominate statements as prescribed by the federated voting procedure. The nomination protocol enjoys several properties when a system has intact nodes (meaning it has avoided complete failure). Speciﬁcally, for each slot: (1) Intact nodes can produce at least one candidate value. (2) At some point, the set of possible candidate values stops growing. (3) If any intact node considers 𝑥 to be a candidate value, then eventually every intact node will consider 𝑥 to be a candidate value. Now consider how the nomination protocol achieves its three properties. Property 1 is achieved because nominate statements are irrefutable. Nodes never vote against nominating a particular value, and until the ﬁrst candidate value is conﬁrmed, intact nodes can vote to nominate any value. So long as any value 𝑥 passes application-level validity checks, intact nodes can vote for and conﬁrm nominate 𝑥. Property 2 is ensuredbecause onceeachintact nodeconﬁrmsat leastone candidate value—whichwill happen in a ﬁnite amount of time—no intact nodes will vote to nominate any new values. Hence, the only values that can become candidates are those that already have votes from intact nodes. Property 3 is a direct consequence of Theorem 11. The nomination process will be more efﬁcient if fewer combinations of values are in play. Hence, we assign nodes a temporary priority and have each node, when possible, nominate the same values as a higher-priority node. More concretely, let 𝐻 be a cryptographic hash function whose range can be interpreted as a set of integers {0,…,ℎmax −1}. (𝐻 might be SHA-256 [National Institute of Standards and Technology 2012], in which case ℎmax = 2256.) Let 𝐺𝑖(𝑚) = 𝐻(𝑖,𝑥𝑖−1,𝑚) be a slot-speciﬁc hash function for slot 𝑖, where 𝑥𝑖−1 is the value chosen for the slot preceding 𝑖 (or the sorted set of values of all immediate dependencies of slot 𝑖 when slots are governed by a partialorder).Givenaslot𝑖andaroundnumber𝑛,eachnode𝑣computesasetofneighbors and a priority for each neighbor as follows: weight(𝑣,𝑣′) =| | |{𝑞 ∣ 𝑞 ∈ 𝐐(𝑣) ∧ 𝑣′ ∈ 𝑞}| | || | |𝐐(𝑣)| | |neighbors (𝑣,𝑛) ={𝑣′ ∣ 𝐺𝑖(N,𝑛,𝑣′) < ℎmax ⋅weight(𝑣,𝑣′)}priority (𝑛,𝑣′) = 𝐺𝑖(P,𝑛,𝑣′) N and P are constants to produce two different hash functions. The function weight(𝑣,𝑣′) returns the fraction of slices in 𝐐(𝑣) containing 𝑣′. By using weight as theprobabilityover𝑛that𝑣′ appearsinneighbors(𝑣,𝑛),wealsoreducethechancethat nodes without a lot of trust will dominate a round. Each node 𝑣 should initially ﬁnd a node 𝑣0 ∈ neighbors(𝑣,0) that maximizes priority(0,𝑣0) among nodes it can communicate with, then vote to nominate the same values as 𝑣0. Only if 𝑣 = 𝑣0 should 𝑣 introduce a new value to nominate. 𝑣 should use timeouts to decide on new nominate statements to vote for. After 𝑛 timeouts, 𝑣 should
The Stellar Consensus Protocol 21
Variable Meaning 𝑋 The set of values node 𝑣 has voted to nominate 𝑌 The set of values node 𝑣 has accepted as nominated 𝑍 The set of values that node 𝑣 considers candidate values 𝑁 The set of the latest NOMINATE message received from each node Fig. 14. Nomination state maintained by node 𝑣 for each slot
NOMINATE 𝑣 𝑖 𝑋 𝑌 𝐷 This is a message from node 𝑣 nominating values for slot 𝑖. 𝐷 is 𝑣’s quorum slice 𝐐(𝑣) or a collision-resistant hash of 𝐐(𝑣). 𝑋 and 𝑌 are from 𝑣’s state. The concrete message encodes the following conceptual messages: — {nominate 𝑥 ∣ 𝑥 ∈ 𝑋} (votes to nominate each value in 𝑋) — {accept(nominate 𝑥) ∣ 𝑥 ∈ 𝑌 } (votes to conﬁrm nominations in 𝑌)
Fig. 15. Message in nomination protocol ﬁnd a node 𝑣𝑛 ∈ neighbors(𝑣,𝑛) maximizing priority(𝑛,𝑣𝑛) and vote to nominate everything 𝑣𝑛 has voted to nominate. THEOREM 12. Eventually, all intact nodes will have the same composite value. PROOF. The theorem follows from the three properties of the nomination protocol. Each intact node will only ever vote to nominate a ﬁnite number of ballots. In the absence of action by ill-behaved nodes, intact nodes will converge on the same set of candidate values, call it 𝑍. To forestall this convergence, ill-behaved nodes may introduce new candidate values, which for a period may be candidates at some but not allintactnodes.Suchvalueswillneedtohavegarneredvotesfromwell-behavednodes, however,whichlimitsthemtoaﬁniteset.Eventually,ill-behavednodeswilleitherstop perturbingthesystemorrunoutofnewcandidatevaluestoinject,inwhichcaseintact nodes will converge on 𝑍. 6.1.1. Concrete nomination protocol. Figure 14 lists the nomination protocol state a node 𝑣 must maintain for each slot. 𝑋 is the set of values 𝑥 for which 𝑣 has voted nominate 𝑥, 𝑌 is the set of values for which 𝑣 has accepted nominate 𝑥, and 𝑍 is the set of candidate values—i.e., all values for which a quorum including 𝑣 has stated accept(nominate 𝑥). Finally, 𝑣 maintains 𝑁, the latest concrete message from each node. (Technically, 𝑋, 𝑌, and 𝑍 can all be recomputed from 𝑁, but it is convenient to be able to reference them directly.) All four ﬁelds are initialized to the empty set. Note that all three of 𝑋, 𝑌, and 𝑍 are growing over time—nodes never remove a value from these sets. Figure 15 shows the concrete message that constitutes the nomination protocol. Because 𝑋 and 𝑌 grow monotonically over time, it is possible to determine which of multiple NOMINATE messages from the same node is the latest, independent of network deliveryorder,solongas𝐷doesnotchangemid-nomination(or𝐷hastobeversioned). Only one remote procedure call (RPC) is needed for nomination—the argument is the sender’s latest NOMINATE message and the return value is the receiver’s. If 𝐷 or the nominated values are cryptographic hashes, a second RPC should permit retrieval of uncached hash preimages as needed. Because nodes cannot tell when the nomination protocol is complete anyway, SCP mustcopewithdifferentcompositevaluesatdifferentnodes.Asanoptimization,then,
22 D. Mazi`eres
nodes can attempt to predict the ﬁnal composite value before they even have a candidate value. To do this, the composite value can be taken as combine(𝑍) when 𝑍 ≠∅, otherwise combine(𝑌) when 𝑌 ≠∅, otherwise combine(𝑋) when 𝑋≠∅. This means the highest-priority node can optimistically initiate balloting at the same time as nomination, piggybacking its ﬁrst ballot message PREPARE (described below) on its ﬁrst NOMINATE message.
6.2. Ballot protocol Once nodes have a composite value, they engage in the ballot protocol, though nomination may continue to update the composite value in parallel. A ballot 𝑏 is a pair of the form 𝑏 =⟨𝑛,𝑥⟩, where 𝑥≠⊥ is a value and 𝑏 is a referendum on externalizing 𝑥 for the slot in question. The value 𝑛≥1 is a counter to ensure higher ballot numbers are always available. We use C-like notation 𝑏.𝑛 and 𝑏.𝑥 to denote the counter and value ﬁelds of ballot 𝑏, so that 𝑏 =⟨𝑏.𝑛,𝑏.𝑥⟩. Ballots are totally ordered, with 𝑏.𝑛 more significant than 𝑏.𝑥. For convenience, a special invalid null ballot 𝟎 =⟨0,⊥⟩is less than all other ballots, and a special counter value ∞ is greater than all other counters. We speak of committing and aborting a ballot 𝑏 as a shorthand for using federated votingtoagreeonthestatementscommit𝑏andabort𝑏,respectively.Foragivenballot, commit and abort are contradictory, so a well-behaved node may vote for at most one of them. In the notation of Section 5, the opposite of commit 𝑏 would be “commit 𝑏,” but abort 𝑏 is a more intuitive notation. Because at most one value can be chosen for a given slot, all committed and stuck ballotsmustcontainthesamevalue.Roughlyspeaking,thismeanscommitstatements are invalid if they conﬂict with lower-numbered unaborted ballots. Deﬁnition (compatible). Two ballots 𝑏1 and 𝑏2 are compatible, written 𝑏1 ∼ 𝑏2, iff 𝑏1.𝑥 = 𝑏2.𝑥 and incompatible, written 𝑏1 ≁ 𝑏2, iff 𝑏1.𝑥 ≠ 𝑏2.𝑥. We also write 𝑏1 ≲ 𝑏2 or 𝑏2 ≳ 𝑏1 iff 𝑏1 ≤ 𝑏2 (or equivalently 𝑏2 ≥ 𝑏1) and 𝑏1 ∼ 𝑏2. Similarly, 𝑏1 ⋦ 𝑏2 or 𝑏2 ⋧ 𝑏1 means 𝑏1≤𝑏2 (or equivalently 𝑏2≥𝑏1) and 𝑏1 ≁ 𝑏2. Deﬁnition (prepared). A ballot 𝑏 is prepared iff every statement in the following set is true: {abort 𝑏old ∣ 𝑏old ⋦ 𝑏}. More precisely, then, commit 𝑏 is valid to vote for only if 𝑏 is conﬁrmed prepared, which nodes ensure through federated voting on the corresponding abort statements. It is convenient to vote on these statements en masse, so wherever we write “𝑏 is prepared,” the surrounding context applies to the whole set of abort statements. In particular, a node votes, accepts, or conﬁrms that 𝑏 is prepared iff it votes for, accepts, or conﬁrms, respectively, all of these aborts. Tocommitaballot𝑏andexternalizeitsvalue𝑏.𝑥,SCPnodesﬁrstacceptandconﬁrm 𝑏 is prepared, then accept and conﬁrm commit 𝑏. Before the ﬁrst intact node votes for commit 𝑏, the prepare step, through federated voting, ensures all intact nodes can eventually conﬁrm 𝑏 is prepared. When an intact node 𝑣 accepts commit 𝑏, it means 𝑏.𝑥 will eventually be chosen. However, as discussed in Section 5.4.1, 𝑣 must conﬁrm commit before acting on it in case 𝑣 is befouled. 6.2.1. Concrete ballot protocol. Figure 16 illustrates the per-slot state maintained by eachnode.Anode𝑣stores:itscurrentphase𝜑;itscurrentballot𝑏;thetwomostrecent incompatible ballots it has prepared (𝑝,𝑝′); the lowest (𝑐) and highest (ℎ) ballot, if any, it has voted to commit and for which it has not subsequently accepted an abort (or for which it has accepted or conﬁrmed a commit in later phases); a next value 𝑧 to try if the current ballot fails; and the latest message received from each node (𝑀). Ballots 𝑏, 𝑝, 𝑝′, and ℎ are non-decreasing within a phase. In addition, if 𝑐 ≠𝟎—meaning 𝑣 may
The Stellar Consensus Protocol 23
Variable Meaning 𝜑 Current phase: one of PREPARE, CONFIRM, or EXTERNALIZE 𝑏 Current ballot that node 𝑣 is attempting to prepare and commit (𝑏≠𝟎) 𝑝′,𝑝 The two highest ballots accepted as prepared such that 𝑝′ ⋦ 𝑝, where 𝑝′ = 𝟎 or 𝑝 = 𝑝′ = 𝟎 if there are no such ballots 𝑐,ℎ In PREPARE: ℎ is the highest ballot conﬁrmed as prepared, or 𝟎 if none; if 𝑐≠𝟎, then 𝑐 is lowest and ℎ the highest ballot for which 𝑣 has voted commit and not accepted abort. In CONFIRM: lowest, highest ballot for which 𝑣 accepted commit In EXTERNALIZE: lowest, highest ballot for which 𝑣 conﬁrmed commit Invariant: if 𝑐≠𝟎, then 𝑐 ≲ ℎ ≲ 𝑏. 𝑧 Value to use in next ballot. If ℎ = 𝟎, then 𝑧 is the composite value (see Section 6.1); otherwise, 𝑧 = ℎ.𝑥. 𝑀 Set of the latest ballot message seen from each node Fig. 16. Ballot state maintained by each node 𝑣 for each slot
PREPARE 𝑣 𝑖 𝑏 𝑝 𝑝′ 𝑐.𝑛 ℎ.𝑛 𝐷 This is a message from node 𝑣 about slot 𝑖. 𝐷 speciﬁes 𝐐(𝑣). The other ﬁelds reﬂect 𝑣’s state. Values 𝑐.𝑥 and ℎ.𝑥 are elided as 𝑐.𝑥 = ℎ.𝑥 = 𝑏.𝑥 when 𝑐.𝑛≠0. This concrete message encodes a host of conceptual statements, as follows: — {abort 𝑏′ ∨ accept(abort 𝑏′) ∣ 𝑏′ ⋦ 𝑏} (a vote to prepare 𝑏) — {accept(abort 𝑏′) ∣ 𝑏′ ⋦ 𝑝} (a vote to conﬁrm 𝑝 is prepared) — {accept(abort 𝑏′) ∣ 𝑏′ ⋦ 𝑝′} (a vote to conﬁrm 𝑝′ is prepared) — {commit 𝑏′ ∣ 𝑐.𝑛≠0 ∧ 𝑐 ≲ 𝑏′ ≲ ℎ} (a vote to commit 𝑐,…,ℎ if 𝑐≠𝟎) CONFIRM 𝑣 𝑖 𝑏 𝑝.𝑛 𝑐.𝑛 ℎ.𝑛 𝐷 Sent by 𝑣 to try to externalize 𝑏.𝑥 for slot 𝑖 after accepting a commit. Implies 𝑝.𝑥 = 𝑐.𝑥 = ℎ.𝑥 = 𝑏.𝑥 in 𝑣’s state. For convenience, we also say 𝑝′ = 𝟎 (𝑝′ is irrelevant after accepting commit). 𝐷 speciﬁes 𝐐(𝑣) as above. Encodes: — Everything implied by PREPARE 𝑣 𝑖⟨∞,𝑏.𝑥⟩𝑝 𝟎 𝑐.𝑛 ∞ 𝐷 — {accept(commit 𝑏′) ∣ 𝑐 ≲ 𝑏′ ≲ ℎ} (a vote to conﬁrm commit 𝑐,…,ℎ) EXTERNALIZE 𝑣 𝑖 𝑥 𝑐.𝑛 ℎ.𝑛 𝐷 After 𝑣 conﬁrms commit⟨𝑐.𝑛,𝑥⟩for slot 𝑖 and externalizes value 𝑥, this message helps other nodes externalize 𝑥. Implies 𝑐 =⟨𝑐.𝑛,𝑥⟩and ℎ =⟨ℎ.𝑛,𝑥⟩. For convenience, we also say 𝑏 = 𝑝 = ℎ =⟨∞,𝑥⟩, and 𝑝′ = 𝟎. Encodes: — Everything implied by CONFIRM 𝑣 𝑖⟨∞,𝑥⟩∞ 𝑐.𝑛 ∞ 𝐷 — Everything implied by CONFIRM 𝑣 𝑖⟨∞,𝑥⟩∞ 𝑐.𝑛 ℎ.𝑛 {{𝑣}} Fig. 17. Messages in SCP’s ballot protocol have participated in ratifying commit 𝑐—code must ensure 𝑐 ≲ ℎ ≲ 𝑏. This invariant guarantees a node can always legally vote to prepare its current ballot 𝑏. Figure 17 shows the three ballot protocol messages, with 𝜑 determining which one of the three a node can send. Ballot messages may overlap with nomination messages, so that, when ℎ = 𝟎, a node may update 𝑧 in response to a NOMINATE message. Note
24 D. Mazi`eres
that “𝑎 ∨ accept(𝑎)” is what each node must assert for a quorum to accept 𝑎 under condition 1 of the deﬁnition of accept. For convenience, when comparing state across nodes, we will identify ﬁelds belonging to particular nodes with subscripts. If 𝑣 is a node, then we write 𝑏𝑣,𝑝𝑣,𝑝′ 𝑣,… to denote the values of 𝑏,𝑝,𝑝′,… in node 𝑣’s state as described in Figure 16. Similarly, we let 𝑣𝑚 denote message 𝑚’s sender, and 𝑏𝑚,𝑝𝑚,𝑝′𝑚,… denote the corresponding values of 𝑏,𝑝,𝑝′,… in 𝑣𝑚’s state as implied by 𝑚. Each node initializes its ballot state for a slot by setting 𝜑 ← PREPARE, 𝑧 ← ⊥, 𝑏 ← ⟨0,𝑧⟩, 𝑀 ← ∅, and all other ﬁelds (𝑝,𝑝′,𝑐,ℎ) to the invalid ballot 𝟎. While 𝑧 = ⊥, a node can receive but not send ballot messages. Once 𝑧 ≠ ⊥, if 𝑏.𝑛 = 0, a node reinitializes 𝑏 ← ⟨1,𝑧⟩to start sending messages. Nodes then repeatedly exchange messageswithpeers,sendingwhicheverballotmessageisindicatedby𝜑.Uponadding a newly received message 𝑚 to 𝑀𝑣, a node 𝑣 updates its state as follows: (1) If 𝜑 = PREPARE and 𝑚 lets 𝑣 accept new ballots as prepared, update 𝑝 and 𝑝′. Afterwards, if either 𝑝 ⋧ ℎ or 𝑝′ ⋧ ℎ, then set 𝑐 ← 𝟎. (2) If 𝜑 = PREPARE and 𝑚 lets 𝑣 conﬁrm new higher ballots prepared, then raise ℎ to the highest such ballot and set 𝑧 ← ℎ.𝑥. (3) If𝜑 = PREPARE,𝑐 = 𝟎,𝑏≤ℎ,andneither𝑝 ⋧ ℎnor𝑝′ ⋧ ℎ,thenset𝑐 tothelowest ballot satisfying 𝑏≤𝑐 ≲ ℎ. (4) If 𝜑 = PREPARE and 𝑣 accepts commit for one or more ballots, set 𝑐 to the lowest such ballot, then set ℎ to the highest ballot such that 𝑣 accepts all {commit 𝑏′ ∣ 𝑐 ≲ 𝑏′ ≲ ℎ}, and set 𝜑 ← CONFIRM. Also set 𝑧 ← ℎ.𝑥 after updating ℎ, and unless ℎ ≲ 𝑏, set 𝑏 ← ℎ. (5) If 𝜑 = CONFIRM and the received message lets 𝑣 accept new ballots prepared, raise 𝑝 to the highest accepted prepared ballot such that 𝑝 ∼ 𝑐. (6) If 𝜑 = CONFIRM and 𝑣 accepts more commit messages or raises 𝑏, then let ℎ′ be the highest ballot such that 𝑣 accepts all {commit 𝑏′ ∣ 𝑏 ≲ 𝑏′ ≲ ℎ′} (if any). If there exists such an ℎ′ and ℎ′ > ℎ, then set ℎ ← ℎ′, and, if necessary, raise 𝑐 to the lowest ballot such that 𝑣 accepts all {commit 𝑏′ ∣ 𝑐 ≲ 𝑏′ ≲ ℎ}. (7) If𝜑 = CONFIRM and𝑣conﬁrmscommit𝑐′ forany𝑐′,set𝑐 andℎtothelowestand highest such ballots, set 𝜑 ← EXTERNALIZE, externalize 𝑐.𝑥, and terminate. (8) If 𝜑 ∈ {PREPARE,CONFIRM} and 𝑏 < ℎ, then set 𝑏 ← ℎ. (9) If 𝜑 ∈ {PREPARE,CONFIRM} and ∃𝑆 ⊆ 𝑀𝑣 such that the set of senders {𝑣𝑚′ ∣ 𝑚′ ∈ 𝑆} is 𝑣-blocking and ∀𝑚′ ∈ 𝑆, 𝑏𝑚′.𝑛 > 𝑏𝑣.𝑛, then set 𝑏 ←⟨𝑛,𝑧⟩, where 𝑛 is the lowest counter for which no such 𝑆 exists. Repeat the previous steps after updating 𝑏. While𝑐 = 𝟎,theaboveprotocolimplementsfederatedvotingtoconﬁrm𝑏isprepared. Once 𝑐≠𝟎, the protocol implements federated voting on commit 𝑐′ for every 𝑐 ≲ 𝑐′ ≲ ℎ. For the CONFIRM phase, once a well-behaved node accepts commit 𝑐, the node never accepts, and hence never attempts to conﬁrm, commit 𝑐′ for any 𝑐′ ≁ 𝑐. Once a commit isconﬁrmed,thevalueofitsballotissafetoexternalizeassumingquorumintersection. All messages sent by a particular node are totally ordered by⟨𝜑,𝑏,𝑝,𝑝′,ℎ⟩, with 𝜑 the most signiﬁcant and ℎ the least signiﬁcant ﬁeld. The values of these ﬁelds can be determined from messages, as described in Figure 17. All PREPARE messages precede all CONFIRM messages, which in turn precede the single EXTERNALIZE message for a given slot. The ordering makes it possible to ensure 𝑀 contains only the latest ballot
The Stellar Consensus Protocol 25
from each node without relying on timing to order the messages, since the network may re-order messages. A few details of the protocol merit explanation. The statements implied by PREPARE of the form “abort 𝑏′ ∨ accept(abort 𝑏′)” do not specify whether 𝑣 is voting for or conﬁrming abort 𝑏′. The distinction is unimportant for the deﬁnition of accept. Glossing over the distinction allows 𝑣 to forget about old ballots it voted to commit (and hence cannot vote to abort), so long as it accepted an abort message for them. Indeed, the only time 𝑣 modiﬁes 𝑐 when 𝑐 ≠𝟎 is to set it back to 𝟎 after accepting abort for every ballot it is voting to commit in step 1 on the preceding page. Conversely, the only time 𝑣 modiﬁes 𝑐 when 𝑐 = 𝟎 is to set it to a value 𝑐≥𝑏 in step 3. Because nodes never vote abort 𝑐 for any 𝑐≥𝑏, no past abort votes can conﬂict with commit 𝑐. Theorem 11 requires that nodes rebroadcast what they have accepted. It follows from the deﬁnition of prepare that the two highest incompatible ballots a node has accepted as prepared subsume all ballots the node has accepted as prepared. Hence, including 𝑝 and 𝑝′ in every message ensures that nodes converge on ℎ—a conﬁrmed prepared ballot. Note further that the ballots a node accepts as prepared must be a superset of the ballots the node conﬁrms as prepared; hence, step 2 can never set ℎ such that ℎ ≁ 𝑐≠𝟎, as step 1 will set 𝑐 ← 𝟎 if the new ℎ is incompatible with the old 𝑐. At the time 𝑣 sends an EXTERNALIZE message, it has accepted {commit 𝑏′ ∣ 𝑏′ ≳ 𝑐}. More importantly, however, it has conﬁrmed {commit 𝑏′ ∣ 𝑐 ≲ 𝑏′ ≲ ℎ}. 𝑣 can assert its acceptance of conﬁrmed statements without regard to 𝐐(𝑣), because it has already checked that one of its slices unanimously agrees; this explains the appearance of {{𝑣}} in place of 𝐷 for the second implicit CONFIRM message in the description of EXTERNALIZE. Eliminating 𝐷 allows a single static EXTERNALIZE message to help other nodes catch up arbitrarily far in the future, even if quorum slices have changed signiﬁcantly in the meantime. Only one RPC is needed to exchange ballot messages. The argument is the sender’s latest message and the return value is the receiver’s latest message. As with NOMINATE, if 𝐷 or the values 𝑥 in ballots are cryptographic hashes, then a separate RPC is needed to retrieve uncached hash preimages. 6.2.2. Timeouts and ballot updates. If all intact nodes start with the same ballot 𝑏, then steps 1 to 9 on the previous page are sufﬁcient to conﬁrm commit 𝑏 and externalize value 𝑏.𝑥. Unfortunately, if the ballot protocol starts before the nomination protocol has converged, nodes may start off with different values for 𝑧. If a ballot fails, or takes long enough that it may fail because of unresponsive nodes, then nodes must time out and try again with a higher ballot. For this reason, nodes employ a timer as follows: (a) A node 𝑣 with 𝜑𝑣 ≠ EXTERNALIZE arms a timer whenever ∃𝑆 ⊆ 𝑀𝑣 such that the set of senders 𝑈 = {𝑣𝑚 ∣ 𝑚 ∈ 𝑆} is a quorum, 𝑣 ∈ 𝑈, and ∀𝑚 ∈ 𝑆, 𝑏𝑚.𝑛≥𝑏𝑣.𝑛. (b) If the timer ﬁres, 𝑣 updates its ballot by setting 𝑏𝑣 ←⟨𝑏𝑣.𝑛+1,𝑧𝑣⟩. Different nodes may start ballots at different times. However, condition (a) delays setting a timer at a node 𝑣 that has gotten ahead of a quorum. Conversely, step 9 on the preceding page allows nodes that have fallen too far behind to catch up without waiting for timers. Taken together, these rules ensure that given long enough timers, intactnodeswillspendtimetogetheronthesameballot;moreover,thistimewillgrow proportionally to the timer duration. To ensure timeouts are long enough without predicting latencies, an implementation can increase the timeout as a function of 𝑏.𝑛. 6.3. Correctness An SCP node cannot vote to conﬁrm commit 𝑏 until it has voted to conﬁrm abort for all lower-numberedincompatibleballots.Becauseawell-behavednodecannotaccept(and
26 D. Mazi`eres hence vote to conﬁrm) contradictory statements, this means that for a given⟨𝐕,𝐐⟩, Theorem 5 ensures a set 𝑆 of well-behaved nodes cannot externalize contradictory valuessolongas𝑆 enjoysquorumintersectiondespite𝐕⧵𝑆.Thissafetyholdsif𝐕and 𝐐changeonlybetweenslots,butwhatiftheychangemid-slot(forinstance,inreaction to node crashes)? To reason about safety under reconﬁguration, we join all old and new quorum slice sets, reﬂecting the fact that nodes may make decisions based on a combination of messages from different conﬁguration eras. To be very conservative, we might require quorum intersection of the aggregation of the present conﬁguration with every past conﬁguration. However, we can relax this slightly by separating nodes that have sent illegal messages from those that have merely crashed. THEOREM 13. Let⟨𝐕1,𝐐1⟩,…,⟨𝐕𝑘,𝐐𝑘⟩be the set of conﬁgurations an FBAS has experienced during agreement on a single slot. Let 𝐕 = 𝐕1 ∪ ⋯ ∪ 𝐕𝑘 and 𝐐(𝑣) = {𝑞 ∣ ∃𝑗 such that 𝑣 ∈ 𝐕𝑗 ∧ 𝑞 ∈ 𝐐𝑗(𝑣)}. Let 𝐵 ⊆ 𝐕 be a set such that 𝐵 contains all illbehaved nodes that have sent illegal messages, though 𝐕⧵𝐵 may still contain crashed (unresponsive) nodes. Suppose nodes 𝑣1 and 𝑣2 are well-behaved, 𝑣1 externalizes 𝑥1 for the slot, and 𝑣2 externalizes 𝑥2. If⟨𝐕,𝐐⟩𝐵 enjoys quorum intersection, then 𝑥1 = 𝑥2. PROOF. For 𝑣1 to externalize 𝑥1, it must have ratiﬁed accept(commit⟨𝑛1,𝑥1⟩) in collaboration with a pseudo-quorum 𝑈1 ⊆ 𝐕. We say pseudo-quorum because 𝑈1 might not be a quorum in⟨𝐕𝑗,𝐐𝑗⟩for any particular 𝑗, as ratiﬁcation may have involved messages spanning multiple conﬁgurations. Nonetheless, for ratiﬁcation to succeed ∀𝑣 ∈ 𝑈1,∃𝑗,∃𝑞 ∈ 𝐐𝑗(𝑣) such that 𝑞 ⊆ 𝑈1. It follows from the construction of 𝐐 that 𝑞 ∈ 𝐐(𝑣). Hence 𝑈1 is a quorum in⟨𝐕,𝐐⟩. By a similar argument a pseudo-quorum 𝑈2 must have ratiﬁed accept(commit⟨𝑛2,𝑥2⟩), and 𝑈2 must be a quorum in⟨𝐕,𝐐⟩. By quorum intersection of⟨𝐕,𝐐⟩𝐵, there must exist some 𝑣 ∈ 𝐕⧵𝐵 such that 𝑣 ∈ 𝑈1 ∩𝑈2. By assumption, such a 𝑣 ∉ 𝐵 could not claim to accept incompatible ballots. Since 𝑣 conﬁrmed accepting commit for ballots with both 𝑥1 and 𝑥2, it must be that 𝑥1 = 𝑥2. For liveness of a node 𝑣, we care about several things when an FBAS has undergone a series of reconﬁgurations⟨𝐕1,𝐐1⟩,…,⟨𝐕𝑘,𝐐𝑘⟩within a single slot. First, the safety prerequisites of Theorem 13 must hold for 𝑣 and the set of nodes 𝑣 cares about, since violating safety undermines liveness and Theorem 11 requires quorum intersection. Second,thesetofill-behavednodesinthelateststate,⟨𝐕𝑘,𝐐𝑘⟩,mustnotbe𝑣-blocking, as this could deny 𝑣 a quorum and prevent it from ratifying statements. Finally, 𝑣’s state must never have been poisoned by a 𝑣-blocking set falsely claiming to accept a statement. To summarize, then, if 𝐵 is the set of nodes that have sent illegal messages, we consider a node 𝑣 to be cumulatively intact when the following conditions hold: (1) 𝑣 is intact in the latest conﬁguration⟨𝐕𝑘,𝐐𝑘⟩, (2) The aggregation of the present and all past conﬁgurations has quorum intersection despite 𝐵 (i.e., the prerequisite for Theorem 13 holds), and (3) 𝐵 is not 𝑣-blocking in⟨𝐕𝑗,𝐐𝑗⟩for any 1≤𝑗≤𝑘. The next few theorems show that ill-behaved nodes cannot drive intact nodes into dead-end stuck states: THEOREM 14. In an FBAS with quorum intersection, if no intact node is in the EXTERNALIZE phase and an intact node with ballot⟨𝑛,𝑥⟩arms its timer as described inSection6.2.2,then,givensufﬁcientcommunication,everyintactnode𝑣canset𝑏𝑣≥𝑛 before any timer ﬁres.
The Stellar Consensus Protocol 27 PROOF. Let𝑆 = {𝑣 ∣ 𝑏𝑣≥𝑛}bethesetofnodeswithcountersatleast𝑛.Byassumption,𝑆 containsanintactnode.Furthermore,becausethatintactnodearmeditstimer, 𝑆 must also encompass a quorum. Let 𝑆+ be the intact subset of 𝑆, and 𝑆− be the set of intact nodes not in 𝑆. By Theorem 10, either 𝑆− = ∅ (in which case the theorem is trivial), or 𝑆+ is 𝑣-blocking for some 𝑣 ∈ 𝑆. By step 9 on page 24, 𝑣 will adjust its ballot so 𝑏𝑣.𝑛 ≥ 𝑛. At this point, repeat the argument with 𝑆 ← 𝑆 ∪{𝑣} until such point as 𝑆− = ∅. THEOREM 15. Given long enough timeouts, if an intact node has reached the CONFIRM phase with 𝑏.𝑥 = 𝑥, then eventually all intact nodes will terminate. PROOF. If an intact node has reached the EXTERNALIZE phase, it has conﬁrmed commit 𝑐 for some ballot 𝑐. By Theorem 11, all intact nodes will conﬁrm commit 𝑐, after which they will terminate in step 7 on page 24. Otherwise, an intact node in the CONFIRM phase has accepted commit 𝑐 where 𝑐 = ⟨𝑛,𝑥⟩. Beforehand, an intact node conﬁrmed 𝑐 was prepared. By Theorem 11, all intact nodes will eventually have ℎ≥𝑐. Moreover, by Theorem 8, no intact node 𝑣 can accept abort 𝑐, so no intact node can accept as prepared any ballot 𝑝 such that 𝑝 ⋧ 𝑐. Hence, after sufﬁcient communication, every intact node will permanently have ℎ ≳ 𝑐. The intact node or nodes with the lowest 𝑏 will, by Theorem 14, raise their ballots until such point as all intact nodes with armed timers have the same ballot counter. Since they also have identical 𝑧 = ℎ.𝑥 = 𝑥, they will all have the same ballot. If they cannot complete the protocol because one or more intact nodes have higher ballots, the nodes with higher numbered ballots will not have timers set. Hence, the nodes with lowernumbered ballots will after a timeout set set 𝑏 ←⟨𝑏.𝑛+1,𝑥⟩until eventually all intact nodes are on the same ballot and can complete the protocol THEOREM 16. Regardless of past ill-behavior, given long enough timeouts and periods in which ill-behaved nodes do not send new messages, intact nodes running SCP will terminate. PROOF. By Theorem 12, all intact nodes will eventually have identical sets 𝑍 of candidate values. Assume this point has passed and every intact node 𝑣 has the same composite value 𝑧 = combine(𝑍). If no intact node ever conﬁrms any ballot 𝑏 prepared without 𝑏.𝑥 = 𝑧, then after at most one timeout, all new ballots of intact nodes will have value 𝑧 and, given a sufﬁcient timeout, complete the protocol. By Theorem 15, nodes will also complete if any intact node has progressed beyond the PREPARE phase. The remaining case is that an intact node has ℎ≠𝟎 and all intact nodes have 𝜑 = PREPARE.ByTheorem14,whentheintactnodeornodeswiththehighest𝑏.𝑛armtheir timers, if timers are long enough, other nodes will catch up. Moreover, by Theorem 11, if timers are long enough, nodes will converge on the value of ℎ (the highest conﬁrmed prepared ballot) before the next timeout, at which point all intact nodes will raise 𝑏 to the same value and complete the protocol. Theorem 16 assures us there are no dead-end states in SCP. However, a set of illbehaved nodes with very good timing could perpetually preempt an SCP system by delaying messages so that some fraction of intact nodes update ℎ right before timers ﬁre and the remaining update it after, preventing intact nodes from converging on the nextballot.Nodescanrecoverfromsuchanattackbyremovingill-behavednodesfrom their slices. An alternative would be to add randomness to the protocol, for instance changing step 2 on page 24 to update 𝑧 with probability 1∕2 (or even with probability proportional to the fraction of the timer remaining). Such an approach would terminate with
28 D. Mazi`eres
probability1, but in worse expected running time for the common case that most or all nodes are well-behaved or fail-stop.
7. LIMITATIONS SCPcanonlyguaranteesafetywhennodeschooseadequatequorumslices.Section3.2 discusses why we can reasonably expect them to do so. Nonetheless, when security depends upon a user-conﬁgurable parameter, there is always the possibility people will set it wrong. Even when people set quorum slices correctly and SCP guarantees safety, safety alone does not rule out other security issues that may arise in a federated system. For example, in a ﬁnancial market, widely trusted nodes could leverage their position in the network to gain information with which to engage in front running or other unethical activities. Byzantine nodes may attempt to ﬁlter transactions on the input side of SCP while otherwise producing the correct output. If well-behaved nodes accept all transactions, the combine function takes the union of transactions, and there are intact nodes, then such ﬁltering will eventually fail to block victim transactions with probability 1, but may nonetheless impose delays. ThoughSCP’ssafetyisoptimal,itsperformanceandcommunicationlatencyarenot. In the common case that nodes have not previously voted to commit ballots incompatible with the current one, it is possible to reduce the number of communication rounds by one. An earlier version of SCP did so, but the protocol description was more complex. First, it required nodes to cache and retransmit signed messages previously sent by failed nodes. Second, it was no longer possible to gloss over the distinction between votes and conﬁrmations of abort statements in PREPARE messages, so nodes had to send around potentially unbounded lists of exceptions to their abort votes. SCPcansufferperpetualpreemptionasdiscussedinSection6.3.Anopenquestionis whether,withoutrandomness,adifferentprotocolcouldguaranteeterminationassuming bounded communication latency but tolerating Byzantine nodes that continuously to inject bad messages at exactly the point where timeouts ﬁre. Such a protocol is not ruled out by the FLP impossibility result [Fischer et al. 1985]. However, the two main techniques to guarantee termination assuming synchrony do not directly apply in the FBA model: PBFT [Castro and Liskov 1999] chooses a leader in round-robin fashion, which is not directly applicable when nodes do not agree on membership. (Possibly something along the lines of priority in Section 6.1 could be adapted.) The Byzantine Generals protocol [Lamport et al. 1982] relays messages so as to compensate for ill-behaved nodes saying different things to different honest nodes, an approach that cannot help when nodes depend on distinct ill-behaved nodes in their slices. Still another possibility might be to leverage both randomness and synchrony to terminate with probability 1, but in shorter expected time than Ben Or-style randomized protocols [Ben-Or 1983] that make no synchrony assumptions. Public coin techniques [?] thatspeeduprandomizedcentralizedByzantineagreementprotocolsappeartobedifﬁcult to adapt to the federated model, barring some cryptographic breakthrough in federated threshold signatures. Unfortunately, changing slices mid-slot to accommodate failed nodes is problematic for liveness if a well-behaved node 𝑣 has ever experienced a wholly malicious and colluding 𝑣-blocking set. The good news is that Theorem 13 guarantees safety to any set 𝑆 of well-behaved nodes enjoying quorum intersection despite 𝐕⧵𝑆, even when 𝑆 has befouled members. The bad news is that updating 𝐐 may be insufﬁcient to unblock nodesifwell-behavednodesweretrickedintovotingtoconﬁrmabadcommitmessage. In such a situation, nodes must disavow past votes, which they can do only by rejoining the system under a new node names. There may exist a way to automate such
The Stellar Consensus Protocol 29
recovery, such as having other nodes recognize reincarnated nodes and automatically update their slices. The FBA model requires continuity of participants over time. Should all nodes simultaneously and permanently leave, restarting consensus would require central coordination or human-level agreement. By contrast, a proof-of-work system such as Bitcoincouldundergosuddencompleteturnoveryetcontinuetooperatewithlittlehuman intervention. On the other hand, if nodes do return, an FBAS can recover from an arbitrarily long outage, while a proof-of-work scheme would face the possibility of an attacker working on a fork during the outage. An intriguing possibility is to leverage SCP to mediate tussles [Clark et al. 2005] by voting on changes to conﬁguration parameters or upgrades to an application protocol. Onewaytodothisistonominatespecialmessagesthatupdateparameters.Candidate values could then consist of both a set of values and a set of parameter updates. A big limitation of this approach is that a set of malicious nodes large enough to deny the system a quorum but not large enough to undermine safety could nonetheless trigger conﬁgurationchangesbylyingandputtingconﬁgurationchangesin𝑌 thatwerenever ratiﬁed. It remains an open question how to vote on parameter changes in a way that requires the consent of a full quorum but also never jeopardizes liveness. 8. SUMMARY Byzantine agreement has long enabled distributed systems to achieve consensus with efﬁciency, standard cryptographic security, and ﬂexibility in designating trusted participants. More recently, Bitcoin introduced the revolutionary notion of decentralized consensus, leading to many new systems and research challenges. This paper introduces federated Byzantine agreement (FBA), a model for achieving decentralized consensus while preserving the traditional beneﬁts of Byzantine agreement. The key distinction between FBA and prior Byzantine agreement systems is that FBA forms quorums from participants’ individual trust decisions, allowing an organic growth model similar to that of the Internet. The Stellar Consensus Protocol (SCP) is a construction for FBA that achieves optimal safety against ill-behaved participants. Acknowledgments Jed McCaleb inspired this work and provided feedback, terminology suggestions, and help thinking through numerous conjectures. Jessica Collier collaborated on writing the paper. Stan Polu created the ﬁrst implementation of SCP and provided invaluable corrections, suggestions, simpliﬁcations, and feedback in the process. Jelle van den Hooff provided the key idea to restructure the paper around quorum intersection and federated voting, as well as other crucial suggestions for terminology, organization, and presentation. Nicolas Barry found several bugs in the paper as he implemented the protocol, as well as identifying necessary clariﬁcations. Ken Birman, Bekki Bolthouse, Joseph Bonneau, Mike Hamburg, Graydon Hoare, Joyce Kim, Tim Makarios, Mark Moir, Robert Morris, Lucas Ryan, and Katherine Tom slogged through drafts of the paper, identifying errors and sources of confusion as well as providing helpful suggestions. Eva Gantz provided helpful motivation and references. Winnie Lim provided guidance on ﬁgures. The reddit community and Tahoe-LAFS group pointed out a censorship weakness in an earlier version of SCP, leading to the improved nomination protocol. Finally, the author would like to thank the whole Stellar team for their support, feedback, and encouragement. Disclaimer ProfessorMazi`eres’scontributiontothispublicationwasasapaidconsultant,andwas not part of his Stanford University duties or responsibilities.