-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(code): Some WAL entries fail to decode #840
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #840 +/- ##
==========================================
- Coverage 75.79% 75.73% -0.06%
==========================================
Files 169 172 +3
Lines 14682 14801 +119
==========================================
+ Hits 11128 11209 +81
- Misses 3554 3592 +38
Flags with carried forward coverage won't be shown. Click here to find out more.
|
// Write timeout | ||
encode_timeout(timeout, &mut buf)?; | ||
// Write tag and timeout if applicable | ||
encode_timeout(Self::TAG_TIMEOUT, timeout, &mut buf)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Thanks so much for figuring this out 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should have consensus decide which entries should be logged and have wal write everything it's told to (by consensus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code/crates/starknet/test/src/lib.rs
Outdated
break 'inner; | ||
} | ||
} | ||
Event::WalReplayError => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is the best place to catch errors (WAL or others) in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure either, but since we already detect errors in the background task, I moved it there.
* Add test that shows: failed to fill whole buffer * Add WAL replay error when unable to decode all entries * Don't add tag w/o value for skipped timers or empty buf to wal * Move new test to wal * Logging cleanup * Fix missing info in logs * Add error detail to WalReplayError event * Add tag to wal only if vote encoding is succesful * Do not instruct WAL to persist step timeouts * Move WalReplayError detection to background task * WAL writes all timeouts it is given, leave choice up to consensus --------- Co-authored-by: Romain Ruetschi <[email protected]>
* feat(test): Add test app based on the example app * Improve logging in tests * Better error handling * Formatting * Fix test suite * Implement `RestreamProposal` in example and test app * Disable `proposal_only` consensus mode tests * Start from latest height in store plus one * Do not run MBT tests in integration test suite * Fix missing logs * Store synced values in database for later retrieval, increase history length to 100 * Formatting * Run integration tests for test app in their own job * Run test app tests with `nextest` for now * Do not fail CI for test app integration test failure * Go back to running tests with maelstrom * Exclude test app from code coverage for now * Ignore unsupported tests * Add back full nodes tests * Wait a bit until starting the first height * Fix off-by-one error in starting height of test and example apps * Fix typo * Slow down test app to ease debugging * Update state.rs Co-authored-by: Anca Zamfir <[email protected]> Signed-off-by: Romain Ruetschi <[email protected]> * feat(code/app/starknet): Adapt Starknet app to latest P2P protos (#819) * Update Starknet protos to their latest version * Adapt Starknet app to new protos * Use height and round as stream id in example app * Use `ConsensusStreamId` proto message for stream id in Starknet app * Fix formatting of stream ids * Update protos for interop * Add `ProposalCommitment` type * Add `BlockInfo` type * Update proposal building and assembly from parts to account for BlockInfo and ProposalCommitment * Update protos * Adapt to latest proto changes * Post-merge fixes * Disable integration tests with proposal mode, keep parts only * Set parts-only in spwan.bash * Add detail to panic when explicit proposal * Fix fmt --------- Co-authored-by: Anca Zamfir <[email protected]> * chore(code): Some WAL entries fail to decode (#840) * Add test that shows: failed to fill whole buffer * Add WAL replay error when unable to decode all entries * Don't add tag w/o value for skipped timers or empty buf to wal * Move new test to wal * Logging cleanup * Fix missing info in logs * Add error detail to WalReplayError event * Add tag to wal only if vote encoding is succesful * Do not instruct WAL to persist step timeouts * Move WalReplayError detection to background task * WAL writes all timeouts it is given, leave choice up to consensus --------- Co-authored-by: Romain Ruetschi <[email protected]> * Post-merge fixes * Refactor parts signature verification * Add parts signature verification to test app * Restream proposal parts when reproposing a value * Formatting * Remove decided proposal from undecided proposals table * Remove duplicated code * Include test app tests in code coverage * Abort whole test on failure * Fix full node test * Show log when request sync value --------- Signed-off-by: Romain Ruetschi <[email protected]> Co-authored-by: Anca Zamfir <[email protected]> Co-authored-by: Anca Zamfir <[email protected]>
…ny app (#836) * feat(test): Add test app based on the example app * Improve logging in tests * Better error handling * Formatting * Fix test suite * Implement `RestreamProposal` in example and test app * Disable `proposal_only` consensus mode tests * Start from latest height in store plus one * Do not run MBT tests in integration test suite * Fix missing logs * Store synced values in database for later retrieval, increase history length to 100 * Formatting * Run integration tests for test app in their own job * Run test app tests with `nextest` for now * Do not fail CI for test app integration test failure * Generalize the test framework to make it work with any app * Fix unit test * Fix bug in config generation * Cleanup * Typo * Cleanup * Fix late starting full node test * Cleanup * Introduce `NodeHandle` abstraction * Cleanup * Wait until the main actor is dead before returning * Go back to running tests with maelstrom * Fix full node tests * Fix config generation in test app tests * Exclude test app from code coverage for now * Ignore unsupported tests * Add back full nodes tests * Remove to initiate logger externally * Wait a bit until starting the first height * Fix off-by-one error in starting height of test and example apps * Fix typo * Slow down test app to ease debugging * Cleanup * Update state.rs Co-authored-by: Anca Zamfir <[email protected]> Signed-off-by: Romain Ruetschi <[email protected]> * feat(code/app/starknet): Adapt Starknet app to latest P2P protos (#819) * Update Starknet protos to their latest version * Adapt Starknet app to new protos * Use height and round as stream id in example app * Use `ConsensusStreamId` proto message for stream id in Starknet app * Fix formatting of stream ids * Update protos for interop * Add `ProposalCommitment` type * Add `BlockInfo` type * Update proposal building and assembly from parts to account for BlockInfo and ProposalCommitment * Update protos * Adapt to latest proto changes * Post-merge fixes * Disable integration tests with proposal mode, keep parts only * Set parts-only in spwan.bash * Add detail to panic when explicit proposal * Fix fmt --------- Co-authored-by: Anca Zamfir <[email protected]> * chore(code): Some WAL entries fail to decode (#840) * Add test that shows: failed to fill whole buffer * Add WAL replay error when unable to decode all entries * Don't add tag w/o value for skipped timers or empty buf to wal * Move new test to wal * Logging cleanup * Fix missing info in logs * Add error detail to WalReplayError event * Add tag to wal only if vote encoding is succesful * Do not instruct WAL to persist step timeouts * Move WalReplayError detection to background task * WAL writes all timeouts it is given, leave choice up to consensus --------- Co-authored-by: Romain Ruetschi <[email protected]> * Post-merge fixes * Refactor parts signature verification * Add parts signature verification to test app * Restream proposal parts when reproposing a value * Formatting * Remove decided proposal from undecided proposals table * Remove duplicated code * Include test app tests in code coverage * Abort whole test on failure * Fix full node test * Show log when request sync value * Make sure Starknet app does not go into round 1 at height 1 * Fix WAL tests for test app * Fix off by one error in logs due to potential race condition in test framework * Clarify that `Default` impl of Height must return the zero height * Require both Starknet and test app tests to pass on CI * Update coverage exclude list * Post-merge fixes --------- Signed-off-by: Romain Ruetschi <[email protected]> Co-authored-by: Anca Zamfir <[email protected]> Co-authored-by: Anca Zamfir <[email protected]>
Closes: #XXX
Sometime wal entries fail to decode with errors like this:
Looks like for
PrevoteTimeLimit
andPrecommitTimeLimit
we still write something to the WAL (just the tag) and later fail to decode.Note: A better fix might be to move the entry filtering logic in consensus who would decide which entries should be logged. For timeouts it could check the type here
PR author checklist
For all contributors
For external contributors