-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0083: Relax Entry Constraints #83
SIMD-0083: Relax Entry Constraints #83
Conversation
## Security Considerations | ||
|
||
None | ||
|
||
## Drawbacks | ||
|
||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks consensus in a pretty fundamental way, which is both a security issue and has some drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lheeger-jump can you please elaborate how will it break the consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Huzaifa696 - Per the below snippet, a self-conflicting entry is currently considered invalid, and will raise an error:
https://github.com/solana-labs/solana/blob/230779d4590ac625e6710117987ef505c0cf71c6/ledger/src/blockstore_processor.rs#L606-L619
That error results in the slot getting marked dead by ReplayStage
:
https://github.com/solana-labs/solana/blob/230779d4590ac625e6710117987ef505c0cf71c6/core/src/replay_stage.rs#L2782-L2786
If the "rules" are changed to allow a self-conflicting entry, then a slot with a self-conflicting entry will no longer error / result in dead slot.
Suppose that the change to allow self-conflicting entries first lands in v1.18
. Nodes running v1.17
will mark slots with a self-conflicting entry dead whereas nodes running v1.18
will allow it. This is obviously problematic during an upgrade period where the stake is evenly distributed between v1.17
and v1.18
. As such, the change to allow self-conflicting entries would have to sit behind a feature gate, and only get enabled once the entire cluster has adopted v1.18
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree with the problem and the solution you explained.
In conclusion, the required changes for this SIMD to allow self-conflicting entries have to sit behind a feature gate.
So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right? @steviez
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right?
I believe so, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right? @steviez
@jacobcreech would know best.
@lheeger-jump can you please elaborate how will it break the consensus.
As @steviez mentioned, this changes the validation criteria for blocks, which affects whether they are rejected. It needs a feature gate which implies coordination with other clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved SIMD -> feature gate issue -> PR
That would be ideal to avoid any work that may be thrown away due to never getting consensus on a SIMD.
## Detailed Design | ||
|
||
Currently, if a transaction entry within a block contains transactions that | ||
conflict, the entire block is marked invalid. | ||
The proposal is that this constraint is removed entirely, and entries are | ||
allowed to contain transactions that conflict with each other. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are clients supposed to handle write contention? Is it sequential in the entry? There is a lot to consider here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution in an entry is always sequential whether its execution or validation. So, we have to change the lock function in a way that allows conflicting txs in the same entry as they will be executed in a sequential manner. You can refer to this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldnt hurt to add that the execution now becomes sequential in the transactions intra microblock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal is not making execution sequential within an entry. It has already been sequential, it is only that individual entries run in parallel (provided that they are not conflicting with each other).
@apfitzge can you please confirm this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently sequential, and actually has no implied or guaranteed execution order during replay.
I will explicitly state that an entry has a sequential ordering, so for conflicting txs they must now be executed sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added clarifying comments in f76e7d3.
The proposed behavior is to execute conflciting transactions sequentially, in the order they appear in the entry.
## Impact | ||
|
||
- The protocol is simplified | ||
- Block-production is more flexible | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has a large affect on consensus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't disagree with your comments that a consensus altering change could fit many of the sections on this document (such as Impact
/ Security
/ Drawbacks
), it could get repetitive to mention it multiple times. I'm wondering if mentioning it once is sufficient; maybe here given the prompt:
solana-improvement-documents/XXXX-template.md
Lines 64 to 67 in ffb3cca
## Backwards Compatibility *(Optional)* | |
Does the feature introduce any breaking changes? All incompatibilities and | |
consequences should be listed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I was under the assumption that all SIMDs are consensus breaking. Is that not the case?
I was under the asssumption that a consensus change and feature-gating was implicit all proposals here; I can be explicit about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I was under the assumption that all SIMDs are consensus breaking. Is that not the case?
I was under the asssumption that a consensus change and feature-gating was implicit all proposals here; I can be explicit about that
I haven't been involved with enough SIMD's to know 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All feature gates should be consensus changing, SIMDs do not need to be. A SIMD can propose application standards or networking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f76e7d3: Added comments about consensus breakage under security considerations.
This proposal is backwards compatible with the current protocol, since it only | ||
relaxes constraints, and does not add any new constraints. | ||
All previously valid blocks would still be valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to beat a dead horse, but Liam is correct about this being a consensus altering change.
All previously valid blocks would still be valid.
This is true. However, a block that was previously invalid might now be valid. Or stated another way, a block might be valid under this new set of rules but invalid under the old set. As with other consensus altering changes, this could create a problem during a network upgrade where say 60% is running with old rules / 40% with new rules. As such, this change would need to be rolled out with a feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment here: #83 (comment)
However, a block that was previously invalid might now be valid.
Your statement is true, but this seems like a *forward compatibility issue to me.
After the activation of the feature, we don't have previously invalid blocks because they cannot, by definition, be in the history. Therefore all previous (backward) blocks are still valid with the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*forward compatibility issue to me.
Yep, agreed. I guess this gets back to whether a SIMD is implicitly thought of as a consensus breaking change or not. In any case, you're seemingly well aware of the implications of the change so marking this item as resolved, and we can figure out how explicit we need to be in stating consensus implications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a small comment that newly produced blocks may not be valid under the old protocol: f76e7d3
## Impact | ||
|
||
- The protocol is simplified | ||
- Block-production is more flexible | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't disagree with your comments that a consensus altering change could fit many of the sections on this document (such as Impact
/ Security
/ Drawbacks
), it could get repetitive to mention it multiple times. I'm wondering if mentioning it once is sufficient; maybe here given the prompt:
solana-improvement-documents/XXXX-template.md
Lines 64 to 67 in ffb3cca
## Backwards Compatibility *(Optional)* | |
Does the feature introduce any breaking changes? All incompatibilities and | |
consequences should be listed. |
The current constraint that transaction entries must not conflict with each | ||
other is unnecessary. | ||
It is a constraint, that made the current labs-client implementation easier, | ||
but is not required for the protocol to function correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should describe more what is intended behavior incase of transactions writing the same accounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@godmodegalactus currently, what happens is that before actual execution, all accounts of an entry are loaded and then the same state is used for execution. This behavior does not allow "writing the same accounts" as is.
But if we carry the temporary changed state as the execution proceeds sequentially in an entry and then commit the final state then there is no issue in writing the same accounts in an entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an entry all the transactions should be able to run in parallel. If in an entry two transactions are loading same accounts, these transactions cannot be executed in parallel right. Even they will have to maintain ordering of the executing because if one is executed before another in replay then consensus will break. So i guess just explicitly mentioning these cases is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest implementation, txs "within an entry" are not executed in parallel in block production or in replay, rather multiple entries run in parallel if they don't conflict with each other.
so, we are safe from the race condition scenario you mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@godmodegalactus added comments in f76e7d3. Conflicting transactions within an entry must be executed sequentially, in the order they appear in the entry.
@Huzaifa696 current master still does rebatching of transactions though. Prior versions, which are the majority of the network, even do randomization on the order of txs within the non-conflicting entries.
conflict, the entire block is marked invalid. | ||
The proposal is that this constraint is removed entirely, and entries are | ||
allowed to contain transactions that conflict with each other. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We implemented and tested the required changes to enable self conflicting entries in our lab setup and this improves the performance a lot depending on how you schedule the transactions on every execution thread.
The changes include:
- changing account locking function
- carrying temporary account state within the execution with every transaction in an entry
As the functions changed are used by both the executor and validator, so it doesn't break any consensus and rules of the chain, and is backward compatible.
These changes can be implemented through a feature gate.
@apfitzge we can add these points to the detailed design if you think it's appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the functions changed are used by both the executor and validator, so it doesn't break any consensus and rules of the chain, and is backward compatible.
These changes can be implemented through a feature gate.
This changes the structure of blocks and that directly affects consensus. In Firedancer today you would break validation criteria for a block and there would be a hard fork from Solana Labs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lheeger-jump is correct, this is a consensus breaking change. If you produce a block that contains entries with conflicting transactions, existing validators will mark it as invalid. For this reason, it needs to sit behind a feature, and only be activated when both validator clients have implemented the feature-gate and those versions have been rolled out to cluster(s).
@apfitzge we can add these points to the detailed design if you think it's appropriate.
This doesn't seem to be appropriate for a SIMD. SIMDs should be independent of any specific client's implementation. The issue in the labs repo you created would be a more appropriate place to share implementation-specific code and any results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've shared some implementation specifics on the issue—please take a look.
@lheeger-jump finally found time to update this. Added a list of the remaining constraints on individual entries. I think I've now hit all your requested changes, lmk if this is not the case. |
@lheeger-jump please review requested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Simple, concise and precise such that I can implement. Good work.
@tao-stones could you or someone from Anza please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am happy with this change
@t-nelson Could you please review then comment or approve from Anza side? |
Since we have an approval from Anza in the form of @tao-stones (please become an official reviewer via process in SIMD-0007), and no substantive comments since January, I am going to merge this PR so it does not go stale. Thanks @apfitzge for your patience. |
The second in a series of SIMDs aimed at the relaxation of constraints blocking us from (potentially) having asynchronous execution.
This proposal does not lock Solana into moving towards asynchronous execution.
This proposal stands on its own to simplify the protocol and give more flexibility in implementation of block-production.