-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EFM] Update protocol.DKG
to use IndexMap
#6338
[EFM] Update protocol.DKG
to use IndexMap
#6338
Conversation
…oved unused logic
…ctionality to produce a correct EpochCommit. Updated tests and fixtures
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/efm-recovery #6338 +/- ##
========================================================
- Coverage 41.52% 41.51% -0.02%
========================================================
Files 2012 2012
Lines 143350 143296 -54
========================================================
- Hits 59530 59490 -40
- Misses 77625 77631 +6
+ Partials 6195 6175 -20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote up the reasons I disagree with some of the new validity checks. We can discuss those here and make sure we're on the same page prior to merging.
Other than that, looks great.
state/protocol/protocol_state/epochs/fallback_statemachine_test.go
Outdated
Show resolved
Hide resolved
state/protocol/protocol_state/epochs/fallback_statemachine_test.go
Outdated
Show resolved
Hide resolved
...ochFallbackStateMachine_TestProcessingMultipleEventsInTheSameBlock-20240813141845-80985.fail
Outdated
Show resolved
Hide resolved
state/protocol/validity.go
Outdated
if len(participants) != len(commit.DKGParticipantKeys) { | ||
return NewInvalidServiceEventErrorf("participant list (len=%d) does not match dkg key list (len=%d)", len(participants), len(commit.DKGParticipantKeys)) | ||
// at max DKG participant count is equal to the number of participants, | ||
// but it can be less if some participants took part in the DKG but are not part of the committee anymore(eg. ejected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with two of the checks here:
len(participants) <= len(commit.DKGParticipantKeys)
- all nodes in
setup.Participants.Filter(filter.IsValidDKGParticipant)
have a DKG index
From the issue, there are 2 additional cases we want to be able to represent:
- (1) there is a node that participated in the DKG, but isn't in the committee
- (2) there is a node that is in the committee, but didn't participate in the DKG
Each of the problematic checks will cause us to not accept DKG models where property (2) is true.
Why would we want to allow EpochCommit events satisfying property (2)?
During epoch recovery, the EpochRecover event contains:
- a copy of the most recent, successful DKG (final epoch prior to entering EFM)
- an identity table consistent with the current staking state (we should exclude nodes which have unstaked, and include nodes which have staked)
Suppose the last successful DKG occurred in Epoch 10, and 100 consensus nodes participated. In Epoch 11, 5 consensus nodes staked to join the network. Also in Epoch 11, the network enters EFM.
When we recover, we will have:
- a DKG with 100 participants (
commit.DKGParticipantKeys
) - an identity table (
participants
) with 105 consensus nodes (105 nodes eligible to participate in DKGs)
In my opinion, this is a valid situation that we should allow. But the resulting EpochRecover would fail the two checks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed those two checks and adjusted tests. Now we should support any combination of participants and DKG data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I finally got to the bottom of this PR to contribute to this conversation: please see my comment below
Co-authored-by: Jordan Schalm <[email protected]>
…om/onflow/flow-go into yurii/6321-update-DKG-implementors
…apping implementation, the latter still being limited to handling DKG data from a trusted dealer only
… has the important documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work: code is clear, consistent and well tested. Thank you.
From my perspective, there are two aspects:
- I have created PR [EFM] Extended documentation where generalized protocol logic meets bootstrapping implementation #6452 (targeting your branch) with largely documentation revisions. If you wouldn't mind including this into your PR, that would be great and then I am happy with the documentation.
- The only other aspect that still might require a few lines of extra code is this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the function IsValidEpochCommit
I think it implements the absolutely necessary consistency checks, which is great. Though, I am concerned that there is no check enforcing sufficient overlap between the DKG committee 𝒟 and the consensus committee 𝒞. Only consensus committee members can vote and a sufficient subset of them must have Random beacon key shares - otherwise consensus cannot progress:
-
We define the random beacon committee
$ℛ$ as the subset of the consensus nodes, which successfully completed the DKG. Hence,$ℛ \subseteq 𝒞$ . (see documentation offlow.DKGIndexMap
for further details) -
Note that there might be nodes
$f$ such that$f \in 𝒞$ and$f \in 𝒟$ , which failed the DKG. In other words, despite$f \in 𝒞 \cap 𝒟$ it might still be the case that$f \notin ℛ$ . Hence,$ℛ \subseteq 𝒞 \cap 𝒟$ . -
At the absolute minimum, we require
signature.RandomBeaconThreshold
nodes in$ℛ$ :(i) Formally,
signature.RandomBeaconThreshold
$\leq |ℛ| \leq | 𝒞 \cap 𝒟~ |$ -
Note that (i) is the absolute lower bound (it assumed there is no node
$𝒞 \cap 𝒟$ that failed the DKG) where even a single offline node could halt consensus❗ This is not practical for a production network. -
Tarak has heuristically derived more practically-reliable bounds - for details, please see
DKG contract success threshold
(notion). Specifically, I would be inclined to reject the inputEpochCommit
as invalid unless it satisfies the heuristical bound(ii) 0.6·
signature.RandomBeaconThreshold
$\leq | 𝒞 \cap 𝒟~ |$ with-
$𝒟$ is the ket setEpochCommit.DKGIndexMap
-
$𝒞$ is theEpochSetup.Participants.Filter(filter.IsConsensusCommitteeMember)
-
@jordanschalm what do you think about enforcing the bound (ii)? At least I think we should enforce the less strict bound (i) with is necessary for consensus progress. In comparison, bound (ii) is heuristically sufficient for consensus progress under non-extreme conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexHentschel @jordanschalm Either way we will implement some sort of check so I went with (ii) ec92c60
(#6338), let me know if you want it updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I'm onboard with adding the suggested checks. But I'm confused by a couple things 🤔👇
Either way we will implement some sort of check so I went with (ii)
The comment says that we are implementing (ii), but the implementation is actually implementing (i).
We are using this code, which is equivalent to (i):
if signature.RandomBeaconThreshold(n) > numberOfRandomBeaconParticipants { ... }
(ii) 0.6·signature.RandomBeaconThreshold ≤|𝒞∩𝒟|
I'm reading this as 0.6 multiplied by signature.RandomBeaconThreshold
, in other words 60% of the threshold value (making the threshold lower!). But it should be (0.6)(𝒟) -- 60% of the DKG committee size.
Also, the value we use on the smart contract is 0.7, or 70%, which covers the worst-case scenario. We may as well use the same value here.
% flow scripts execute transactions/dkg/scripts/get_thresholds.cdc -n mainnet
Result: s.56c14e18f445f3a648d2455303c75a984cd93be3d4530a65fa17a6e4e1f38e07.Thresholds(native: 42, safe: 59, safePercentage: 0.70000000)
Suggestions
- Use 70% of the DKG committee size as the safety threshold (define a
RandomBeaconSafetyThreshold
constant/function alongsidesignature.RandomBeaconThreshold
that documents what this separate threshold means) - It would be ideal to include this new check when we generate the transaction arguments, so that any failures are caught much sooner and are easier to deal with. If it's non-trivial to add there, though, I'm fine to skip this suggestion, since the likelihood of this failure is low, and it will be caught eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexHentschel @jordanschalm let me know if you have any concerns about godoc 790b016
(#6338), can address in separate PR if needs changes.
[EFM] Extended documentation where generalized protocol logic meets bootstrapping implementation
Co-authored-by: Alexander Hentschel <[email protected]>
Co-authored-by: Alexander Hentschel <[email protected]>
Co-authored-by: Alexander Hentschel <[email protected]>
…for random beacon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I'm onboard with adding the suggested checks. But I'm confused by a couple things 🤔👇
Either way we will implement some sort of check so I went with (ii)
The comment says that we are implementing (ii), but the implementation is actually implementing (i).
We are using this code, which is equivalent to (i):
if signature.RandomBeaconThreshold(n) > numberOfRandomBeaconParticipants { ... }
(ii) 0.6·signature.RandomBeaconThreshold ≤|𝒞∩𝒟|
I'm reading this as 0.6 multiplied by signature.RandomBeaconThreshold
, in other words 60% of the threshold value (making the threshold lower!). But it should be (0.6)(𝒟) -- 60% of the DKG committee size.
Also, the value we use on the smart contract is 0.7, or 70%, which covers the worst-case scenario. We may as well use the same value here.
% flow scripts execute transactions/dkg/scripts/get_thresholds.cdc -n mainnet
Result: s.56c14e18f445f3a648d2455303c75a984cd93be3d4530a65fa17a6e4e1f38e07.Thresholds(native: 42, safe: 59, safePercentage: 0.70000000)
Suggestions
- Use 70% of the DKG committee size as the safety threshold (define a
RandomBeaconSafetyThreshold
constant/function alongsidesignature.RandomBeaconThreshold
that documents what this separate threshold means) - It would be ideal to include this new check when we generate the transaction arguments, so that any failures are caught much sooner and are easier to deal with. If it's non-trivial to add there, though, I'm fine to skip this suggestion, since the likelihood of this failure is low, and it will be caught eventually.
Co-authored-by: Jordan Schalm <[email protected]>
Co-authored-by: Jordan Schalm <[email protected]>
Co-authored-by: Jordan Schalm <[email protected]>
Co-authored-by: Jordan Schalm <[email protected]>
Co-authored-by: Jordan Schalm <[email protected]>
…om/onflow/flow-go into yurii/6321-update-DKG-implementors
#6321
Context
This PR makes modifications to the current code base by using
EpochCommit
to implement theprotocol.DKG
interface.Most of the changes are centered around usage of
IndexMap
in places where we previously used epoch participants from the setup event.Additionally made another attempt to fix the fuzzy test, added test data to check if the issue is reproducible.