SHARE
TWEET

Untitled

a guest Jun 19th, 2019 55 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. --------------------------------- MODULE RaftMongo ---------------------------------
  2. \* This is the formal specification for the Raft consensus algorithm in MongoDB
  3.  
  4. EXTENDS Naturals, FiniteSets, Sequences, TLC, Animation, Integers
  5.  
  6. \* The set of server IDs
  7. CONSTANTS Server
  8.  
  9. \* Server states.
  10. \* Candidate is not used, but this is fine.
  11. CONSTANTS Follower, Candidate, Leader
  12.  
  13. \* A reserved value.
  14. CONSTANTS Nil
  15.  
  16. ----
  17. \* Global variables
  18.  
  19. \* The server's term number.
  20. VARIABLE globalCurrentTerm
  21.  
  22. ----
  23. \* The following variables are all per server (functions with domain Server).
  24.  
  25. \* The server's state (Follower, Candidate, or Leader).
  26. VARIABLE state
  27.  
  28. \* The commit point learned by each server.
  29. VARIABLE commitPoint
  30.  
  31. \* The current sync source of each server, if it has one.
  32. VARIABLE syncSource
  33.  
  34. electionVars == <<globalCurrentTerm, state>>
  35. serverVars == <<electionVars, commitPoint, syncSource>>
  36.  
  37. \* A Sequence of log entries. The index into this sequence is the index of the
  38. \* log entry. Unfortunately, the Sequence module defines Head(s) as the entry
  39. \* with index 1, so be careful not to use that!
  40. VARIABLE log
  41. logVars == <<log>>
  42.  
  43. \* End of per server variables.
  44. ----
  45.  
  46. \* All variables; used for stuttering (asserting state hasn't changed).
  47. vars == <<serverVars, logVars>>
  48.  
  49. ----
  50. \* Helpers
  51.  
  52. \* The set of all quorums. This just calculates simple majorities, but the only
  53. \* important property is that every quorum overlaps with every other.
  54. Quorum == {i \in SUBSET(Server) : Cardinality(i) * 2 > Cardinality(Server)}
  55.  
  56. \* The term of the last entry in a log, or 0 if the log is empty.
  57. GetTerm(xlog, index) == IF index = 0 THEN 0 ELSE xlog[index].term
  58. LogTerm(i, index) == GetTerm(log[i], index)
  59. LastTerm(xlog) == GetTerm(xlog, Len(xlog))
  60.  
  61. \* Return the minimum value from a set, or undefined if the set is empty.
  62. Min(s) == CHOOSE x \in s : \A y \in s : x <= y
  63. \* Return the maximum value from a set, or undefined if the set is empty.
  64. Max(s) == CHOOSE x \in s : \A y \in s : x >= y
  65.  
  66. ----
  67. \* Define initial values for all variables
  68.  
  69. InitServerVars == /\ globalCurrentTerm = 0
  70.                   /\ state             = [i \in Server |-> Follower]
  71.                   /\ commitPoint       = [i \in Server |-> [term |-> 0, index |-> 0]]
  72.                   /\ syncSource        = [i \in Server |-> Nil]
  73. InitLogVars == /\ log          = [i \in Server |-> << >>]
  74. Init == /\ InitServerVars
  75.         /\ InitLogVars
  76.  
  77. ----
  78. \* Message handlers
  79. \* i = recipient, j = sender, m = message
  80.  
  81. \* Is log[i] a prefix of log[j].
  82. IsLogPrefix(i, j) ==
  83.     /\ Len(log[i]) < Len(log[j])
  84.     /\ LastTerm(log[i]) = LogTerm(j, Len(log[i]))
  85.  
  86. AppendOplog(i, j) ==
  87.     \* /\ state[i] = Follower  \* Disable primary catchup and draining
  88.     /\ syncSource[i] = j \* Optionally enable sync source selection rules.
  89.     /\ Len(log[i]) < Len(log[j])
  90.     /\ LastTerm(log[i]) = LogTerm(j, Len(log[i]))
  91.     /\ log' = [log EXCEPT ![i] = Append(log[i], log[j][Len(log[i]) + 1])]
  92.     /\ UNCHANGED <<serverVars>>
  93.  
  94. CanRollbackOplog(i, j) ==
  95.     /\ Len(log[i]) > 0
  96.     /\ \* The log with later term is more up-to-date
  97.        LastTerm(log[i]) < LastTerm(log[j])
  98.     /\
  99.        \/ Len(log[i]) > Len(log[j])
  100.        \* There seems no short-cut of OR clauses, so I have to specify the negative case
  101.        \/ /\ Len(log[i]) <= Len(log[j])
  102.           /\ LastTerm(log[i]) /= LogTerm(j, Len(log[i]))
  103.  
  104. RollbackOplog(i, j) ==
  105.     /\ CanRollbackOplog(i, j)
  106.     \* Rollback 1 oplog entry
  107.     /\ LET new == [index2 \in 1..(Len(log[i]) - 1) |-> log[i][index2]]
  108.          IN log' = [log EXCEPT ![i] = new]
  109.     \* For any nodes that are currently syncing from you, clear their sync sources.
  110.     /\ syncSource' = [s \in Server |-> IF syncSource[s] = i THEN Nil ELSE syncSource[s]]
  111.     /\ UNCHANGED <<electionVars, commitPoint>>
  112.  
  113. \* The set of nodes that has log[me][logIndex] in their oplog
  114. Agree(me, logIndex) ==
  115.     { node \in Server :
  116.         /\ Len(log[node]) >= logIndex
  117.         /\ LogTerm(me, logIndex) = LogTerm(node, logIndex) }
  118.  
  119. IsCommitted(me, logIndex) ==
  120.     /\ Agree(me, logIndex) \in Quorum
  121.     \* If we comment out the following line, a replicated log entry from old primary will voilate the safety.
  122.     \* [ P (2), S (), S ()]
  123.     \* [ S (2), S (), P (3)]
  124.     \* [ S (2), S (2), P (3)] !!! the log from term 2 shouldn't be considered as committed.
  125.     /\ LogTerm(me, logIndex) = globalCurrentTerm
  126.  
  127. \* RollbackCommitted and NeverRollbackCommitted are not actions.
  128. \* They are used for verification.
  129. RollbackCommitted(i) ==
  130.     \E j \in Server:
  131.         /\ CanRollbackOplog(i, j)
  132.         /\ IsCommitted(i, Len(log[i]))
  133.  
  134. NeverRollbackCommitted ==
  135.     \A i \in Server: ~RollbackCommitted(i)
  136.  
  137. \* ACTION
  138. \* i = the new primary node.
  139. BecomePrimaryByMagic(i) ==
  140.     LET notBehind(me, j) ==
  141.             \/ LastTerm(log[me]) > LastTerm(log[j])
  142.             \/ /\ LastTerm(log[me]) = LastTerm(log[j])
  143.                /\ Len(log[me]) >= Len(log[j])
  144.         ayeVoters(me) ==
  145.             { index \in Server : notBehind(me, index) }
  146.     IN /\ ayeVoters(i) \in Quorum
  147.        /\ state' = [index \in Server |-> IF index = i THEN Leader ELSE Follower]
  148.        /\ globalCurrentTerm' = globalCurrentTerm + 1
  149.        /\ syncSource' = [syncSource EXCEPT ![i] = Nil] \* clear sync source.
  150.        /\ UNCHANGED <<commitPoint, logVars>>
  151.  
  152. \* ACTION
  153. \* Leader i receives a client request to add v to the log.
  154. ClientWrite(i) ==
  155.     /\ state[i] = Leader
  156.     /\ LET entry == [term  |-> globalCurrentTerm]
  157.            newLog == Append(log[i], entry)
  158.        IN  log' = [log EXCEPT ![i] = newLog]
  159.     /\ UNCHANGED <<serverVars>>
  160.  
  161. \* ACTION
  162. AdvanceCommitPoint ==
  163.     \E leader \in Server :
  164.         /\ state[leader] = Leader
  165.         /\ IsCommitted(leader, Len(log[leader]))
  166.         /\ commitPoint' = [commitPoint EXCEPT ![leader] = [term |-> LastTerm(log[leader]), index |-> Len(log[leader])]]
  167.         /\ UNCHANGED <<electionVars, logVars, syncSource>>
  168.  
  169. \* Return whether Node i can learn the commit point from Node j.
  170. CommitPointLessThan(i, j) ==
  171.    \/ commitPoint[i].term < commitPoint[j].term
  172.    \/ /\ commitPoint[i].term = commitPoint[j].term
  173.       /\ commitPoint[i].index < commitPoint[j].index
  174.  
  175. \* ACTION
  176. \* Node i learns the commit point from j via heartbeat.
  177. LearnCommitPoint(i, j) ==
  178.     /\ CommitPointLessThan(i, j)
  179.     /\ commitPoint' = [commitPoint EXCEPT ![i] = commitPoint[j]]
  180.     /\ UNCHANGED <<electionVars, logVars, syncSource>>
  181.  
  182. \* ACTION
  183. \* Node i learns the commit point from j via heartbeat with term check
  184. LearnCommitPointWithTermCheck(i, j) ==
  185.     /\ LastTerm(log[i]) = commitPoint[j].term
  186.     /\ LearnCommitPoint(i, j)
  187.  
  188. \* ACTION
  189. LearnCommitPointFromSyncSource(i, j) ==
  190.     /\ ENABLED AppendOplog(i, j)
  191.     /\ LearnCommitPoint(i, j)
  192.  
  193. \* ACTION
  194. LearnCommitPointFromSyncSourceNeverBeyondLastApplied(i, j) ==
  195.     \* From sync source
  196.     /\ ENABLED AppendOplog(i, j)
  197.     /\ CommitPointLessThan(i, j)
  198.     \* Never beyond last applied
  199.     /\ LET myCommitPoint ==
  200.             \* If they have the same term, commit point can be ahead.
  201.             IF commitPoint[j].term <= LastTerm(log[i])
  202.             THEN commitPoint[j]
  203.             ELSE [term |-> LastTerm(log[i]), index |-> Len(log[i])]
  204.        IN commitPoint' = [commitPoint EXCEPT ![i] = myCommitPoint]
  205.     /\ UNCHANGED <<electionVars, logVars>>
  206.  
  207. \* ACTION
  208. AppendEntryAndLearnCommitPointFromSyncSource(i, j) ==
  209.     \* Append entry
  210.     /\ syncSource[i] = j \* Optionally enable sync source selection rules.
  211.     /\ Len(log[i]) < Len(log[j])
  212.     /\ LastTerm(log[i]) = LogTerm(j, Len(log[i]))
  213.     /\ log' = [log EXCEPT ![i] = Append(log[i], log[j][Len(log[i]) + 1])]
  214.     \* Learn commit point
  215.     /\ CommitPointLessThan(i, j)
  216.     /\ commitPoint' = [commitPoint EXCEPT ![i] = commitPoint[j]]
  217.     /\ UNCHANGED <<electionVars, syncSource>>
  218.  
  219. \* ACTION
  220. \* Server i chooses server j as its new sync source.
  221. \* i can choose j as a sync source if log[i] is a prefix of log[j] and
  222. \* log[j] is longer than log[i].
  223. ChooseNewSyncSource(i, j) ==
  224.     /\ \/ IsLogPrefix(i, j)
  225.        \* If logs are equal, allow choosing sync source if it has a newer commit point.
  226.        \/ /\ log[i] = log[j]
  227.           /\ commitPoint[j].index > commitPoint[i].index
  228.     /\ state[i] = Follower \* leaders don't need to sync oplog entries.
  229.     /\ syncSource' = [syncSource EXCEPT ![i] = j]
  230.     /\ UNCHANGED <<electionVars, logVars, commitPoint>>
  231.  
  232. \* Does a 2 node sync source cycle exist?
  233. SyncSourceCycleTwoNode ==
  234.     \E s, t \in Server :
  235.         /\ s # t
  236.         /\ syncSource[s] = t
  237.         /\ syncSource[t] = s
  238.  
  239. \* The set of all sequences with elements in set 's', of maximum length 'n'.
  240. BoundedSeq(s, n) == [1..n -> s]
  241.  
  242. \* The set of all paths in the graph that consists of edges <<s,t>> where s has t as a sync
  243. \* source.
  244. SyncSourcePaths ==
  245.     {p \in BoundedSeq(Server, Cardinality(Server)) :
  246.         \A i \in 1..(Len(p)-1) : syncSource[p[i]] = p[i+1]}
  247.  
  248. \* Is there a non-trivial path in the sync source graph from node i to node j?
  249. \* This rules out trivial paths i.e. those of length 1.
  250. SyncSourcePath(i, j) ==
  251.     \E p \in SyncSourcePaths :
  252.         /\ Len(p) > 1
  253.         /\ p[1] = i \* the source node.
  254.         /\ p[Len(p)]=j \* the target node.
  255.  
  256. \* Does a general (possibly multi-node) sync source cycle exist?
  257. SyncSourceCycle ==
  258.     \E s \in Server : SyncSourcePath(s, s)
  259.        
  260. ----
  261. AppendOplogAction ==
  262.     \E i,j \in Server : AppendOplog(i, j)
  263.  
  264. RollbackOplogAction ==
  265.     \E i,j \in Server : RollbackOplog(i, j)
  266.  
  267. BecomePrimaryByMagicAction ==
  268.     \E i \in Server : BecomePrimaryByMagic(i)
  269.  
  270. ClientWriteAction ==
  271.     \E i \in Server : ClientWrite(i)
  272.  
  273. LearnCommitPointAction ==
  274.     \E i, j \in Server : LearnCommitPoint(i, j)
  275.  
  276. LearnCommitPointWithTermCheckAction ==
  277.     \E i, j \in Server : LearnCommitPointWithTermCheck(i, j)
  278.  
  279. LearnCommitPointFromSyncSourceAction ==
  280.     \E i, j \in Server : LearnCommitPointFromSyncSource(i, j)
  281.  
  282. LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction ==
  283.     \E i, j \in Server : LearnCommitPointFromSyncSourceNeverBeyondLastApplied(i, j)
  284.  
  285. AppendEntryAndLearnCommitPointFromSyncSourceAction ==
  286.     \E i, j \in Server : AppendEntryAndLearnCommitPointFromSyncSource(i, j)
  287.  
  288. ChooseNewSyncSourceAction ==
  289.     \E i, j \in Server : ChooseNewSyncSource(i, j)
  290.    
  291. ----
  292. \* Properties to check
  293.  
  294. RollbackBeforeCommitPoint(i) ==
  295.     /\ \E j \in Server:
  296.         /\ CanRollbackOplog(i, j)
  297.     /\ \/ LastTerm(log[i]) < commitPoint[i].term
  298.        \/ /\ LastTerm(log[i]) = commitPoint[i].term
  299.           /\ Len(log[i]) <= commitPoint[i].index
  300. \* todo: clean up
  301.  
  302. NeverRollbackBeforeCommitPoint == \A i \in Server: ~RollbackBeforeCommitPoint(i)
  303.  
  304. \* Liveness check
  305.  
  306. \* This isn't accurate for any infinite behavior specified by Spec, but it's fine
  307. \* for any finite behavior with the liveness we can check with the model checker.
  308. \* This is to check at any time, if two nodes' commit points are not the same, they
  309. \* will be the same eventually.
  310. \* This is checked after all possible rollback is done.
  311. CommitPointEventuallyPropagates ==
  312.     /\ \A i, j \in Server:
  313.         [](commitPoint[i] # commitPoint[j] ~>
  314.                <>(~ENABLED RollbackOplogAction => commitPoint[i] = commitPoint[j]))
  315.  
  316. ----
  317. \* Defines how the variables may transition.
  318. Next ==
  319.     \* --- Replication protocol
  320.     \/ AppendOplogAction
  321.     \/ RollbackOplogAction
  322.     \/ BecomePrimaryByMagicAction
  323.     \/ ClientWriteAction
  324.     \/ ChooseNewSyncSourceAction
  325.     \*
  326.     \* --- Commit point learning protocol
  327.     \/ AdvanceCommitPoint
  328.     \* \/ LearnCommitPointAction
  329.     \/ LearnCommitPointFromSyncSourceAction
  330.     \* \/ AppendEntryAndLearnCommitPointFromSyncSourceAction
  331.     \* \/ LearnCommitPointWithTermCheckAction
  332.     \* \/ LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction
  333.  
  334. Liveness ==
  335.     /\ SF_vars(AppendOplogAction)
  336.     /\ SF_vars(RollbackOplogAction)
  337.     \* A new primary should eventually write one entry.
  338.     /\ WF_vars(\E i \in Server : LastTerm(log[i]) # globalCurrentTerm /\ ClientWrite(i))
  339.     \* /\ WF_vars(ClientWriteAction)
  340.     \*
  341.     \* --- Commit point learning protocol
  342.     /\ WF_vars(AdvanceCommitPoint)
  343.     \* /\ WF_vars(LearnCommitPointAction)
  344.     /\ SF_vars(LearnCommitPointFromSyncSourceAction)
  345.     \* /\ SF_vars(AppendEntryAndLearnCommitPointFromSyncSourceAction)
  346.     \* /\ SF_vars(LearnCommitPointWithTermCheckAction)
  347.     \* /\ SF_vars(LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction)
  348.  
  349. \* The specification must start with the initial state and transition according
  350. \* to Next.
  351. Spec == Init /\ [][Next]_vars /\ Liveness
  352.  
  353.  
  354. \*
  355. \*
  356. \* TRACE ANIMATION.
  357. \*
  358. \*
  359.  
  360. Pick(S) == CHOOSE x \in S : TRUE
  361.  
  362. RECURSIVE SetToSeq(_)
  363. SetToSeq(S) == IF S = {} THEN <<>>
  364.                ELSE LET v == Pick(S) IN <<v>> \o SetToSeq(S \ {v})
  365.    
  366. Injective(f) == \A x, y \in DOMAIN f : f[x] = f[y] => x = y
  367.  
  368.            
  369. ServerId == CHOOSE f \in [Server -> 1..Cardinality(Server)] : Injective(f)
  370.                
  371. \* This animation assumes 4 servers.
  372. Base == [x |-> 50, y |-> 50]
  373. ServerPositions ==
  374.       1 :> [x |-> Base.x, y |-> Base.y] @@
  375.       2 :> [x |-> Base.x + 200, y |-> Base.y + 0] @@
  376.       3 :> [x |-> Base.x + 200, y |-> Base.y + 150] @@              
  377.       4 :> [x |-> Base.x + 0, y |-> Base.y + 150]              
  378.                
  379.  
  380. \* Colors to represent the state of a server.
  381. StateColor ==
  382.       Leader :> "green" @@
  383.       Candidate :> "yellow" @@
  384.       Follower :> "darkgray"
  385.                
  386. \* Generates a graphic element representing server 'sid'.
  387. ServerElem(sid) ==
  388.     LET pos == ServerPositions[ServerId[sid]]
  389. \*        circleAttrs == ("stroke" :> "black" @@ "stroke-width" :> "1" @@ "fill" :> "blue")
  390.         circleAttrs == [fill |-> StateColor[state[sid]]]
  391.         circle == Circle(pos.x, pos.y, 12, circleAttrs) IN
  392.         Group(<<circle>>, <<>>)
  393.                
  394.  
  395. ServerElems == {Group(<<ServerElem(s)>>, <<>>) : s \in Server}              
  396.  
  397. \* Linearly interpolate a position halfway between two given points. 't', the interpolation factor, is measured
  398. \* on a scale from 0 to 100.
  399. Interp(p1, p2, t) ==
  400.     [ x |-> (t * p1.x + (100-t) * p2.x) \div 100 ,
  401.       y |-> (t * p1.y + (100-t) * p2.y) \div 100]            
  402.  
  403.  
  404. \* The position of a log slot for a server and log index.
  405. LogEntryDims == [w |-> 22, h |-> 22]
  406. LogPos(s, i) == [x |-> ServerPositions[ServerId[s]].x + 40,
  407.                  y |-> ServerPositions[ServerId[s]].y + 20 - (i * (LogEntryDims.w + 3))]
  408.  
  409. \* Generate a single log entry element at index 'i' for server 'sid'.
  410. LogEntryElem(sid, i) ==
  411.     LET pos == LogPos(sid, i)
  412.         entry == Rect(pos.x, pos.y, LogEntryDims.w, LogEntryDims.h, [fill |-> "lightblue"])
  413.         term == Text((pos.x + LogEntryDims.w \div 2 - 6), (pos.y + LogEntryDims.h \div 2 + 8), ToString(log[sid][i].term),  <<>>) IN
  414.     Group(<<entry, term>>, <<>>)
  415.    
  416. LogElem(sid) ==
  417.     LET entryElems == {LogEntryElem(sid, i) : i \in DOMAIN log[sid]} IN
  418.     Group(SetToSeq(entryElems), <<>>)
  419.    
  420. LogElems == {LogElem(s) : s \in Server}
  421.  
  422. \* Log Slot Elements
  423. MaxLogSlots == 2
  424. LogSlot(sid, i) == Rect(LogPos(sid, i).x, LogPos(sid, i).y, LogEntryDims.w + 2, LogEntryDims.h, [stroke |-> (IF commitPoint[sid].index = i THEN "red" ELSE "black"),  fill |-> "lightgray"])
  425. LogSlotElem(sid) ==
  426.     LET slotElems == {LogSlot(sid, i) : i \in 1..MaxLogSlots} IN
  427.     Group(SetToSeq(slotElems), <<>>)
  428. LogSlotElems == {LogSlotElem(s) : s \in Server}
  429.  
  430.  
  431. SyncSource(s) ==
  432.     LET src == ServerPositions[ServerId[s]]
  433.         target == ServerPositions[ServerId[syncSource[s]]]
  434.         line == Line(src.x, src.y, target.x, target.y, [stroke |-> "black"])
  435.         headP == Interp(src, target, 5)
  436.         head == Circle(headP.x, headP.y, 3, [fill |-> "orange"]) IN
  437.         Group(<<line, head>>, <<>>)
  438.  
  439. SyncSourceElems == {SyncSource(i) : i \in {s \in Server : syncSource[s] # Nil}}
  440.  
  441. \*AllElems == SetToSeq(LogElems) \o SetToSeq(ServerElems)
  442. AllElems == SetToSeq(ServerElems)
  443.             \o SetToSeq(SyncSourceElems)
  444.             \o SetToSeq(LogSlotElems)
  445.             \o SetToSeq(LogElems)
  446.            
  447.  
  448. View == Group(AllElems, <<>>)  
  449.  
  450.  
  451. ===============================================================================
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top