-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0046: Optimistic cluster restart automation #46
SIMD-0046: Optimistic cluster restart automation #46
Conversation
Co-authored-by: mvines <[email protected]>
Co-authored-by: mvines <[email protected]>
Co-authored-by: mvines <[email protected]>
…-improvement-documents into smart-restart-proposal
proposals/0024-repair-and-restart.md
Outdated
|
||
So after a validator sees that 75% of the validators received 75% of the votes, | ||
wait for 10 more minutes so that the message it sent out have propagated, then | ||
restart from the Heaviest slot everyone agreed on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our last call I was thinking once each validator has figured out the heaviest fork and repaired up to the highest oc slot, the validator would:
- Issue a "hard fork" at the highest oc slot, which also changes the gossip shred version
- Execute the existing "--wait-for-supermajority" logic (ie, purge all slots above the highest oc slot, wait for 80%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. I think we probably. should wait for 75% here because we assume 5% could be non-conforming.
proposals/0024-repair-and-restart.md
Outdated
|
||
We calculate "enough" stake as follows. When there are 80% validators joining | ||
the restart, assuming 5% restarted validators can make mistakes in voting, any | ||
block with more than 67% - 5% - (100-80)% = 42% could potentially be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this calculation.
What if the other 100% - 42% = 58%
pick some other block?
Why should the minority 42%
block be optimistically confirmed?
Why should ever a block with less than 67%
vote be optimistically confirmed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to prevent false negative (if a slot was oc'ed before the restart, you must pick it here), not to prevent false positive (it's okay if we pick a slot here which isn't oc'ed). Because when we select Heaviest later we should see the competing fork and count the votes accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prolly add the motivation and justification for these values to the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
proposals/0024-repair-and-restart.md
Outdated
2.1 If vote_on_child + stake_on_validators_not_in_restart >= 62%, pick child. | ||
For example, if 80% validators are in restart, child has 42% votes, then | ||
42 + (100-80) = 62%, pick child. 62% is chosen instead of 67% because 5% | ||
could make the wrong votes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, I am not sure why it is safe to go below 67%?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to prevent false negative at all costs and it's okay to have false positive. Let's say X is the first block having only 62% but not 67%, we know if 75% of the validators decide to pick this fork, it will be instantly oc'ed and we won't kick another oc'ed slot out. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, add the motivation and justification in the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Co-authored-by: mvines <[email protected]>
@t-nelson Want to give the oldest open SIMD another look? |
Congrats on getting this over the line and thanks to those who contributed. Solana Mainnet Beta™ will be safer for it 🫡 |
No description provided.