Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLA+ spec for safekeeper membership change #9966

Merged
merged 9 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions safekeeper/spec/MCProposerAcceptorStatic.tla
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ EXTENDS TLC, ProposerAcceptorStatic

\* Augments the spec with model checking constraints.

\* Note that MCProposerAcceptorReconfig duplicates it and might need to
\* be updated as well.

\* For model checking.
CONSTANTS
max_entries, \* model constraint: max log entries acceptor/proposer can hold
Expand Down
100 changes: 70 additions & 30 deletions safekeeper/spec/ProposerAcceptorReconfig.tla
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,29 @@

(*
Spec for https://github.com/neondatabase/neon/blob/538e2312a617c65d489d391892c70b2e4d7407b5/docs/rfcs/035-safekeeper-dynamic-membership-change.md

Simplifications:
- The ones inherited from ProposerAcceptorStatic.
- We don't model transient state of the configuration change driver process
(storage controller in the implementation). Its actions StartChange and FinishChange
are taken based on the persistent state of safekeepers and conf store. The
justification for that is the following: once new configuration n is
created (e.g with StartChange or FinishChange), any old configuration
change driver working on older conf < n will never be able to commit
it to the conf store because it is protected by CAS. The
propagation of these older confs is still possible though, and
spec allows to do it through acceptors.
Plus the model is already pretty huge.
- Previous point also means that the FinishChange action is
based only on the current state of safekeepers, not from
the past. That's ok because while individual
acceptor <last_log_term, flush_lsn> may go down,
quorum one never does. So the FinishChange
condition which collects max of the quorum may get
only more strict over time.

The invariants expectedly break if any of FinishChange
required conditions are removed.
*)

EXTENDS Integers, Sequences, FiniteSets, TLC
Expand Down Expand Up @@ -65,7 +88,7 @@ Init ==
/\ prop_conf = [p \in proposers |-> init_conf]
/\ acc_conf = [a \in acceptors |-> init_conf]
/\ conf_store = init_conf
\* We could start with anything, but to reduce space state start with
\* We could start with anything, but to reduce state space state with
\* the most reasonable total acceptors - 1 conf size, which e.g.
\* makes basic {a1} -> {a2} change in {a1, a2} acceptors and {a1, a2,
\* a3} -> {a2, a3, a4} in {a1, a2, a3, a4} acceptors models even in
Expand Down Expand Up @@ -165,21 +188,21 @@ CommitEntries(p) ==
/\ PAS!DoCommitEntries(p, PAS!Min(q1_commit_lsn, q2_commit_lsn))
/\ UNCHANGED <<prop_conf, acc_conf, conf_store>>

\* Proposer p adopts higher conf from acceptor a.
ProposerSwitchConf(p, a) ==
\* p's conf is lower than a's.
/\ (acc_conf[a].generation > prop_conf[p].generation)
\* We allow to seamlessly bump conf only when proposer is already elected.
\* If it isn't, some of the votes already collected could have been from
\* non-members of updated conf. It is easier (both here and in the impl) to
\* restart instead of figuring and removing these out.
\*
\* So if proposer is in 'campaign' in the impl we would restart preserving
\* conf and increasing term. In the spec this transition is already covered
\* by more a generic RestartProposer, so we don't specify it here.
/\ prop_state[p].state = "leader"
/\ prop_conf' = [prop_conf EXCEPT ![p] = acc_conf[a]]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, acc_conf, conf_store>>
\* Proposer p adopts higher conf c from conf store or from some acceptor.
ProposerSwitchConf(p) ==
/\ \E c \in ({conf_store} \union {acc_conf[a]: a \in acceptors}):
\* p's conf is lower than c.
/\ (c.generation > prop_conf[p].generation)
\* We allow to bump conf without restart only when wp is already elected.
\* If it isn't, the votes it has already collected are from the previous
\* configuration and can't be used.
\*
\* So if proposer is in 'campaign' in the impl we would restart preserving
\* conf and increasing term. In the spec this transition is already covered
\* by more a generic RestartProposer, so we don't specify it here.
/\ prop_state[p].state = "leader"
/\ prop_conf' = [prop_conf EXCEPT ![p] = c]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, acc_conf, conf_store>>

\* Do CAS on the conf store, starting change into the new_members conf.
StartChange(new_members) ==
Expand All @@ -201,8 +224,16 @@ FinishChange ==
\* The conditions for finishing the change are:
/\ \E qo \in PAS!AllMinQuorums(conf_store.members):
arssher marked this conversation as resolved.
Show resolved Hide resolved
\* 1) Old majority must be aware of the joint conf.
\* Note: generally the driver can't know current acceptor
\* generation, it can only know that it once had been the
\* expected one, but it might have advanced since then.
\* But as explained at the top of the file if acceptor gen
\* advanced, FinishChange will never be able to complete
\* due to CAS anyway. We use strict equality here because
\* that's what makes sense conceptually (old driver should
\* abandon its attempt if it observes that conf has advanced).
/\ \A a \in qo: conf_store.generation = acc_conf[a].generation
arssher marked this conversation as resolved.
Show resolved Hide resolved
\* 2) New member set must have log synced, i.e. some majority needs
\* 2) New member set must have log synced, i.e. some its majority needs
\* to have <last_log_term, lsn> at least as high as max of some
\* old majority.
\* 3) Term must be synced, i.e. some majority of the new set must
Expand All @@ -221,6 +252,7 @@ FinishChange ==
\A a \in qn:
/\ PAS!TermLsnGE([term |-> AccLastLogTerm(a), lsn |-> PAS!FlushLsn(a)], sync_pos)
/\ acc_state[a].term >= sync_term
\* The same note as above about strict equality applies here.
/\ conf_store.generation = acc_conf[a].generation
arssher marked this conversation as resolved.
Show resolved Hide resolved
/\ conf_store' = [generation |-> conf_store.generation + 1, members |-> conf_store.newMembers, newMembers |-> NULL]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf>>
Expand All @@ -232,12 +264,13 @@ AbortChange ==
/\ conf_store' = [generation |-> conf_store.generation + 1, members |-> conf_store.members, newMembers |-> NULL]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, acc_conf>>

\* Acceptor a switches to higher configuration (we model it as taken from the
\* store, but it can be from proposer/peers as well).
\* Acceptor a switches to higher configuration from the conf store
\* or from some proposer.
AccSwitchConf(a) ==
/\ acc_conf[a].generation < conf_store.generation
/\ acc_conf' = [acc_conf EXCEPT ![a] = conf_store]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, conf_store>>
/\ \E c \in ({conf_store} \union {prop_conf[p]: p \in proposers}):
/\ acc_conf[a].generation < c.generation
/\ acc_conf' = [acc_conf EXCEPT ![a] = c]
/\ UNCHANGED <<prop_state, acc_state, committed, elected_history, prop_conf, conf_store>>

\* Nuke all acceptor state if it is not a member of its current conf. Models
\* cleanup after migration/abort.
Expand All @@ -252,7 +285,6 @@ AccReset(a) ==
/\ prop_state' = [p \in proposers |-> [prop_state[p] EXCEPT !.nextSendLsn[a] = NULL]]
/\ UNCHANGED <<committed, elected_history, prop_conf, acc_conf, conf_store>>


\*******************************************************************************
\* Final spec
\*******************************************************************************
Expand All @@ -269,7 +301,7 @@ Next ==
\/ \E new_members \in SUBSET acceptors: StartChange(new_members)
\/ FinishChange
\/ AbortChange
\/ \E p \in proposers: \E a \in acceptors: ProposerSwitchConf(p, a)
\/ \E p \in proposers: ProposerSwitchConf(p)
\/ \E a \in acceptors: AccSwitchConf(a)
\/ \E a \in acceptors: AccReset(a)

Expand All @@ -279,6 +311,15 @@ Spec == Init /\ [][Next]_<<prop_state, acc_state, committed, elected_history, pr
\* Invariants
\********************************************************************************

AllConfs ==
{conf_store} \union {prop_conf[p]: p \in proposers} \union {acc_conf[a]: a \in acceptors}

\* Fairly trivial (given the conf store) invariant that different configurations
\* with the same generation are never issued.
ConfigSafety ==
\A c1, c2 \in AllConfs:
(c1.generation = c2.generation) => (c1 = c2)

ElectionSafety == PAS!ElectionSafety

ElectionSafetyFull == PAS!ElectionSafetyFull
Expand All @@ -291,18 +332,17 @@ LogSafety == PAS!LogSafety
\* Invariants which don't need to hold, but useful for playing/debugging.
\********************************************************************************

\* Check that we ever switch into non joint conf.
MaxAccConf == ~ \E a \in acceptors:
/\ acc_conf[a].generation = 3
/\ acc_conf[a].newMembers /= NULL
arssher marked this conversation as resolved.
Show resolved Hide resolved

CommittedNotTruncated == PAS!CommittedNotTruncated

MaxTerm == PAS!MaxTerm
\* MaxTerm == \A p \in proposers: prop_state[p].term <= 1

MaxStoreConf == conf_store.generation <= 1

\* Check that we ever switch into non joint conf.
MaxAccConf == ~ \E a \in acceptors:
/\ acc_conf[a].generation = 3
/\ acc_conf[a].newMembers /= NULL

MaxAccWalLen == PAS!MaxAccWalLen

MaxCommitLsn == PAS!MaxCommitLsn
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ SPECIFICATION Spec
CONSTRAINT StateConstraint
INVARIANT
TypeOk
ConfigSafety
ElectionSafetyFull
LogIsMonotonic
LogSafety
\* CommittedNotTruncated
\* As its comment explains generally it is not expected to hold, but
\* in such small model it is true.
CommittedNotTruncated
SYMMETRY ProposerAcceptorSymmetry
CHECK_DEADLOCK FALSE
ALIAS Alias
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ SPECIFICATION Spec
CONSTRAINT StateConstraint
INVARIANT
TypeOk
ConfigSafety
ElectionSafetyFull
LogIsMonotonic
LogSafety
\* CommittedNotTruncated
CommittedNotTruncated
SYMMETRY ProposerAcceptorSymmetry
CHECK_DEADLOCK FALSE
ALIAS Alias
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ SPECIFICATION Spec
CONSTRAINT StateConstraint
INVARIANT
TypeOk
ConfigSafety
ElectionSafetyFull
LogIsMonotonic
LogSafety
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ TypeOk
ElectionSafetyFull
LogIsMonotonic
LogSafety
CommittedNotTruncated
SYMMETRY ProposerAcceptorSymmetry
CHECK_DEADLOCK FALSE
ALIAS Alias
Expand Down
Loading