-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestMaxBlockSize flake #3086
Comments
I got this to reproduce locally so here's more info: bad_block_EJVzDj.json My hypothesis was that this test starts with app version 1 and then the bad block gets produced with app version 2 because some other test upgraded the testnode version but I re-ran the test locally with debug logs and this time it passed and all blocks appear to use app version 2.
|
It looks like the test node app version is on 0 but the blocks are getting produced with app version 2.
|
Notes while revisiting this:
|
The app version of the block is actually set by comet which is probably on version 2 whereas prepare proposal and process proposal have v0 because the context its not correct set |
Sorry I put this on pause because a few weeks ago we thought this would be resolved by some app version fixes:
TBH I thought there was one other issue / PR version related but can't find it (cc: @cmwaters ). Regardless, since this is still occurring on main, it's likely that none of the version fixes actually resolved this issue. |
To repro more quickly, one can run:
|
The test was skipped in #3142 so this issue should make it unflaky and then re-enable it |
I wasn't able to get visibility because I was trying to log the version in prepare/process proposal using: fmt.Printf("app version in prepare proposal %v\n", sdkCtx.ConsensusParams().Version.AppVersion) which was silently panicing so I only observed test timeouts. It is possible to print the version via: fmt.Printf("app version in prepare proposal %v\n", sdkCtx.ConsensusParams().Version.GetAppVersion()) |
Interestingly can't repro with single blob or multi blob test cases. Can only repro with
|
There is a discrepancy in the square sizes used in celestia-app/app/process_proposal.go Line 122 in f3fb834
celestia-app/app/extend_block.go Line 18 in f3fb834
|
At least locally, I was able to resolve flakiness via the diff in #3150 |
Fortunately it looks like celestia-node doesn't use the So Separately, celestia-node needs a way to figure out the max effective square size from app and then use that to extend the block. |
A few concrete action items:
|
this might stem all the way back to #1772, where we might have |
Copying from a Slack convo, we may actually want to use the upper bound in process proposal and extend block so that celestia-node isn't stateful (i.e. doesn't need access to gov max square size) when it tries to extend squares. |
It can query the gov max square size and perform the same method of taking the min of the two. The problem is that if this were to change As an aside, I thought the whole point of |
+1 but celestia-node doesn't use it currently.
I vaguely recall the same outcome. I think we agreed to try and avoid celestia-node having to query the gov max square size param each block. If celestia-node doesn't query the gov max square size then this should hold: |
On celestia-app v1.x, ProcessProposal and ExtendBlock both have a worstCaseShareIndexes that always uses 128. celestia-app/pkg/square/builder.go Lines 424 to 425 in b25766f
On celestia-app main, we use go-square's implementation of worstCaseShareIndexes which uses a max square size plumbed in via square.Construct(). See here: func worstCaseShareIndexes(blobs, maxSquareSize int) []uint32 {
shareIndexes := make([]uint32, blobs) ProcessProposal calls square.Construct with Therefore, the bug is not present on celestia-app v1.x or any celestia-node release. The bug is present on go-square v1.0.0 and celestia-app main. |
Created celestiaorg/go-square#47 |
Blocked on a release of go-square with celestiaorg/go-square#49. The team has different opinions on whether it should be released as 1.0.1 or 2.0.0. We're meeting tomorrow to decide. |
Problem
Occurences
The text was updated successfully, but these errors were encountered: