Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLA+ spec for safekeeper membership change #9966

Merged
merged 9 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions safekeeper/spec/MCProposerAcceptorReconfig.tla
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---- MODULE MCProposerAcceptorReconfig ----
EXTENDS TLC, ProposerAcceptorReconfig

\* Augments the spec with model checking constraints.

\* It slightly duplicates MCProposerAcceptorStatic, but we can't EXTENDS it
\* because it EXTENDS ProposerAcceptorStatic in turn. The duplication isn't big
\* anyway.
arssher marked this conversation as resolved.
Show resolved Hide resolved

\* For model checking.
CONSTANTS
max_entries, \* model constraint: max log entries acceptor/proposer can hold
max_term, \* model constraint: max allowed term
max_generation \* mode constraint: max config generation

ASSUME max_entries \in Nat /\ max_term \in Nat /\ max_generation \in Nat

\* Model space constraint.
StateConstraint == /\ \A p \in proposers:
/\ prop_state[p].term <= max_term
/\ Len(prop_state[p].wal) <= max_entries
/\ conf_store.generation <= max_generation

\* Sets of proposers and acceptors and symmetric because we don't take any
\* actions depending on some concrete proposer/acceptor (like IF p = p1 THEN
\* ...)
ProposerAcceptorSymmetry == Permutations(proposers) \union Permutations(acceptors)

\* enforce order of the vars in the error trace with ALIAS
\* Note that ALIAS is supported only since version 1.8.0 which is pre-release
\* as of writing this.
Alias == [
prop_state |-> prop_state,
prop_conf |-> prop_conf,
acc_state |-> acc_state,
acc_conf |-> acc_conf,
committed |-> committed,
conf_store |-> conf_store
]

====
310 changes: 310 additions & 0 deletions safekeeper/spec/ProposerAcceptorReconfig.tla
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
---- MODULE ProposerAcceptorReconfig ----

(*
Spec for https://github.com/neondatabase/neon/blob/538e2312a617c65d489d391892c70b2e4d7407b5/docs/rfcs/035-safekeeper-dynamic-membership-change.md
*)

EXTENDS Integers, Sequences, FiniteSets, TLC

VARIABLES
\* state which is the same in the static spec
prop_state,
acc_state,
committed,
elected_history,
\* reconfiguration only state
prop_conf, \* prop_conf[p] is current configuration of proposer p
acc_conf, \* acc_conf[a] is current configuration of acceptor a
conf_store \* configuration in the configuration store.

CONSTANT
acceptors,
proposers

CONSTANT NULL

\* Import ProposerAcceptorStatic under PAS.
\*
\* Note that all vars and consts are named the same and thus substituted
\* implicitly.
PAS == INSTANCE ProposerAcceptorStatic

\********************************************************************************
\* Helpers
\********************************************************************************

\********************************************************************************
\* Type assertion
\********************************************************************************

\* Is c a valid config?
IsConfig(c) ==
/\ DOMAIN c = {"generation", "members", "newMembers"}
\* Unique id of the configuration.
/\ c.generation \in Nat
/\ c.members \in SUBSET acceptors
\* newMembers is NULL when it is not a joint conf.
/\ \/ c.newMembers = NULL
\/ c.newMembers \in SUBSET acceptors

TypeOk ==
/\ PAS!TypeOk
/\ \A p \in proposers: IsConfig(prop_conf[p])
/\ \A a \in acceptors: IsConfig(acc_conf[a])
/\ IsConfig(conf_store)

\********************************************************************************
\* Initial
\********************************************************************************

Init ==
/\ PAS!Init
/\ \E init_members \in SUBSET acceptors:
LET init_conf == [generation |-> 1, members |-> init_members, newMembers |-> NULL] IN
\* refer to RestartProposer why it is not NULL
/\ prop_conf = [p \in proposers |-> init_conf]
/\ acc_conf = [a \in acceptors |-> init_conf]
arssher marked this conversation as resolved.
Show resolved Hide resolved
/\ conf_store = init_conf
\* We could start with anything, but to reduce space state start with
\* the most reasonable total acceptors - 1 conf size, which e.g.
arssher marked this conversation as resolved.
Show resolved Hide resolved
\* makes basic {a1} -> {a2} change in {a1, a2} acceptors and {a1, a2,
\* a3} -> {a2, a3, a4} in {a1, a2, a3, a4} acceptors models even in
\* the smallest models with single change.
/\ Cardinality(init_members) = Cardinality(acceptors) - 1
arssher marked this conversation as resolved.
Show resolved Hide resolved

\********************************************************************************
\* Actions
\********************************************************************************

\* Proposer p loses all state, restarting. In the static spec we bump restarted
\* proposer term to max of some quorum + 1 which is a minimal term which can win
\* election. With reconfigurations it's harder to calculate such a term, so keep
\* it simple and take random acceptor one + 1.
\*
\* Also make proposer to adopt configuration of another random acceptor. In the
\* impl proposer starts with NULL configuration until handshake with first
\* acceptor. Removing this NULL special case makes the spec a bit simpler.
RestartProposer(p) ==
/\ \E a \in acceptors: PAS!RestartProposerWithTerm(p, acc_state[a].term + 1)
/\ \E a \in acceptors: prop_conf' = [prop_conf EXCEPT ![p] = acc_conf[a]]
/\ UNCHANGED <<acc_conf, conf_store>>

\* Acceptor a immediately votes for proposer p.
Vote(p, a) ==
\* Configuration must be the same.
/\ prop_conf[p].generation = acc_conf[a].generation
\* And a is expected be a member of it. This is likely redundant as long as
\* becoming leader checks membership (though vote also contributes to max
\* <term, lsn> calculation).
/\ \/ a \in prop_conf[p].members
\/ (prop_conf[p].newMembers /= NULL) /\ (a \in prop_conf[p].newMembers)
/\ PAS!Vote(p, a)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

\* Proposer p gets elected.
BecomeLeader(p) ==
/\ prop_state[p].state = "campaign"
\* Votes must form quorum in both sets (if the newMembers exists).
/\ PAS!FormsQuorum(DOMAIN prop_state[p].votes, prop_conf[p].members)
/\ \/ prop_conf[p].newMembers = NULL
\* TLA+ disjunction evaluation doesn't short-circuit for a good reason:
\* https://groups.google.com/g/tlaplus/c/U6tOJ4dsjVM/m/UdOznPCVBwAJ
\* so repeat the null check.
\/ (prop_conf[p].newMembers /= NULL) /\ (PAS!FormsQuorum(DOMAIN prop_state[p].votes, prop_conf[p].newMembers))
\* DoBecomeLeader will copy WAL of the highest voter to proposer's WAL, so
\* ensure its conf is still the same. In the impl WAL fetching also has to
\* check the configuration.
/\ prop_conf[p].generation = acc_conf[PAS!MaxVoteAcc(p)].generation
/\ \A a \in DOMAIN prop_state[p].votes: prop_conf[p].generation = acc_conf[a].generation
/\ PAS!DoBecomeLeader(p)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

UpdateTerm(p, a) ==
/\ PAS!UpdateTerm(p, a)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

TruncateWal(p, a) ==
/\ prop_state[p].state = "leader"
\* Configuration must be the same.
/\ prop_conf[p].generation = acc_conf[a].generation
/\ PAS!TruncateWal(p, a)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

NewEntry(p) ==
/\ PAS!NewEntry(p)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

AppendEntry(p, a) ==
/\ prop_state[p].state = "leader"
\* Configuration must be the same.
/\ prop_conf[p].generation = acc_conf[a].generation
\* And a is member of it. Ignoring this likely wouldn't hurt, but not useful
\* either.
/\ \/ a \in prop_conf[p].members
\/ (prop_conf[p].newMembers /= NULL) /\ (a \in prop_conf[p].newMembers)
/\ PAS!AppendEntry(p, a)
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

\* see PAS!CommitEntries for comments.
CommitEntries(p) ==
/\ prop_state[p].state = "leader"
/\ \E q1 \in PAS!AllMinQuorums(prop_conf[p].members):
LET q1_commit_lsn == PAS!QuorumCommitLsn(p, q1) IN
\* Configuration must be the same.
/\ \A a \in q1: prop_conf[p].generation = acc_conf[a].generation
/\ q1_commit_lsn /= NULL
\* We must collect acks from both quorums, if newMembers is present.
/\ IF prop_conf[p].newMembers = NULL THEN
PAS!DoCommitEntries(p, q1_commit_lsn)
ELSE
\E q2 \in PAS!AllMinQuorums(prop_conf[p].newMembers):
LET q2_commit_lsn == PAS!QuorumCommitLsn(p, q2) IN
\* Configuration must be the same.
/\ \A a \in q1: prop_conf[p].generation = acc_conf[a].generation
/\ q2_commit_lsn /= NULL
/\ PAS!DoCommitEntries(p, PAS!Min(q1_commit_lsn, q2_commit_lsn))
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

\* Proposer p adopts higher conf from acceptor a.
ProposerSwitchConf(p, a) ==
\* p's conf is lower than a's.
/\ (acc_conf[a].generation > prop_conf[p].generation)
\* We allow to seamlessly bump conf only when proposer is already elected.
\* If it isn't, some of the votes already collected could have been from
\* non-members of updated conf. It is easier (both here and in the impl) to
\* restart instead of figuring and removing these out.
\*
\* So if proposer is in 'campaign' in the impl we would restart preserving
\* conf and increasing term. In the spec this transition is already covered
\* by more a generic RestartProposer, so we don't specify it here.
/\ prop_state[p].state = "leader"
/\ prop_conf' = [prop_conf EXCEPT ![p] = acc_conf[a]]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, acc_conf, conf_store>>

\* Do CAS on the conf store, starting change into the new_members conf.
StartChange(new_members) ==
\* Possible only if we don't already have the change in progress.
/\ conf_store.newMembers = NULL
\* Not necessary, but reduces space a bit.
/\ new_members /= conf_store.members
/\ conf_store' = [generation |-> conf_store.generation + 1, members |-> conf_store.members, newMembers |-> new_members]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf>>

\* Acceptor's last_log_term.
AccLastLogTerm(acc) ==
PAS!LastLogTerm(PAS!AcceptorTermHistory(acc))

\* Do CAS on the conf store, transferring joint conf into the newMembers only.
FinishChange ==
\* have joint conf
/\ conf_store.newMembers /= NULL
\* The conditions for finishing the change are:
/\ \E qo \in PAS!AllMinQuorums(conf_store.members):
arssher marked this conversation as resolved.
Show resolved Hide resolved
\* 1) Old majority must be aware of the joint conf.
/\ \A a \in qo: conf_store.generation = acc_conf[a].generation
arssher marked this conversation as resolved.
Show resolved Hide resolved
\* 2) New member set must have log synced, i.e. some majority needs
\* to have <last_log_term, lsn> at least as high as max of some
\* old majority.
\* 3) Term must be synced, i.e. some majority of the new set must
\* have term >= than max term of some old majority.
\* This ensures that two leaders are never elected with the same
\* term even after config change (which would be bad unless we treat
\* generation as a part of term which we don't).
\* 4) A majority of the new set must be aware of the joint conf.
\* This allows to safely destoy acceptor state if it is not a
\* member of its current conf (which is useful for cleanup after
\* migration as well as for aborts).
/\ LET sync_pos == PAS!MaxTermLsn({[term |-> AccLastLogTerm(a), lsn |-> PAS!FlushLsn(a)]: a \in qo})
sync_term == PAS!Maximum({acc_state[a].term: a \in qo})
IN
\E qn \in PAS!AllMinQuorums(conf_store.newMembers):
\A a \in qn:
/\ PAS!TermLsnGE([term |-> AccLastLogTerm(a), lsn |-> PAS!FlushLsn(a)], sync_pos)
/\ acc_state[a].term >= sync_term
/\ conf_store.generation = acc_conf[a].generation
arssher marked this conversation as resolved.
Show resolved Hide resolved
/\ conf_store' = [generation |-> conf_store.generation + 1, members |-> conf_store.newMembers, newMembers |-> NULL]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf>>

\* Do CAS on the conf store, aborting the change in progress.
AbortChange ==
\* have joint conf
/\ conf_store.newMembers /= NULL
/\ conf_store' = [generation |-> conf_store.generation + 1, members |-> conf_store.members, newMembers |-> NULL]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf>>

\* Acceptor a switches to higher configuration (we model it as taken from the
\* store, but it can be from proposer/peers as well).
AccSwitchConf(a) ==
/\ acc_conf[a].generation < conf_store.generation
arssher marked this conversation as resolved.
Show resolved Hide resolved
/\ acc_conf' = [acc_conf EXCEPT ![a] = conf_store]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, conf_store>>

\* Nuke all acceptor state if it is not a member of its current conf. Models
\* cleanup after migration/abort.
AccReset(a) ==
/\ \/ (acc_conf[a].newMembers = NULL) /\ (a \notin acc_conf[a].members)
\/ (acc_conf[a].newMembers /= NULL) /\ (a \notin (acc_conf[a].members \union acc_conf[a].newMembers))
/\ acc_state' = [acc_state EXCEPT ![a] = PAS!InitAcc]
\* Set nextSendLsn to `a` to NULL everywhere. nextSendLsn serves as a mark
\* that elected proposer performed TruncateWal on the acceptor, which isn't
\* true anymore after state reset. In the impl local deletion is expected to
\* terminate all existing connections.
/\ prop_state' = [p \in proposers |-> [prop_state[p] EXCEPT !.nextSendLsn[a] = NULL]]
/\ UNCHANGED <<committed, elected_history, prop_conf, acc_conf, conf_store>>


\*******************************************************************************
\* Final spec
\*******************************************************************************

Next ==
\/ \E p \in proposers: RestartProposer(p)
\/ \E p \in proposers: \E a \in acceptors: Vote(p, a)
\/ \E p \in proposers: BecomeLeader(p)
\/ \E p \in proposers: \E a \in acceptors: UpdateTerm(p, a)
\/ \E p \in proposers: \E a \in acceptors: TruncateWal(p, a)
\/ \E p \in proposers: NewEntry(p)
\/ \E p \in proposers: \E a \in acceptors: AppendEntry(p, a)
\/ \E p \in proposers: CommitEntries(p)
\/ \E new_members \in SUBSET acceptors: StartChange(new_members)
\/ FinishChange
\/ AbortChange
\/ \E p \in proposers: \E a \in acceptors: ProposerSwitchConf(p, a)
\/ \E a \in acceptors: AccSwitchConf(a)
\/ \E a \in acceptors: AccReset(a)

Spec == Init /\ [][Next]_<<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf, conf_store>>

\********************************************************************************
\* Invariants
\********************************************************************************
arssher marked this conversation as resolved.
Show resolved Hide resolved

ElectionSafety == PAS!ElectionSafety

ElectionSafetyFull == PAS!ElectionSafetyFull

LogIsMonotonic == PAS!LogIsMonotonic

LogSafety == PAS!LogSafety

\********************************************************************************
\* Invariants which don't need to hold, but useful for playing/debugging.
\********************************************************************************

CommittedNotTruncated == PAS!CommittedNotTruncated

MaxTerm == PAS!MaxTerm
\* MaxTerm == \A p \in proposers: prop_state[p].term <= 1
arssher marked this conversation as resolved.
Show resolved Hide resolved

MaxStoreConf == conf_store.generation <= 1

\* Check that we ever switch into non joint conf.
MaxAccConf == ~ \E a \in acceptors:
/\ acc_conf[a].generation = 3
arssher marked this conversation as resolved.
Show resolved Hide resolved
/\ acc_conf[a].newMembers /= NULL

MaxAccWalLen == PAS!MaxAccWalLen

MaxCommitLsn == PAS!MaxCommitLsn

====
Loading
Loading