-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gossip: Random test failure in gossip test "test filtering values works" #157
Comments
Sorry, I'd like to keep this open until we actually have it solved. The code for helping us solve it is in place, it just needs to randomly occur at some point, or the code needs to fundamentally change such that we can guarantee it won't. |
just hit it - The failing seed is: '1718031575764' |
Can confirm that using '1718031575764' as the seed consistently reproduces the failure state. |
another one: (first in list below) EDIT: i'll keep adding them to this list as i see them:
|
Can't reproduce the behavior using '1718158590375'. |
I just noticed there's another non-deterministic element, the filter parameter: sig/src/gossip/pull_response.zig Line 136 in 488fb37
generated here: sig/src/gossip/pull_response.zig Lines 102 to 110 in 488fb37
is random as well, because of this chain of code:
which means there's another source of RNG also based on I think we should do an audit on this, and overall replace all uses of |
yeah pulling out the random seeds to deterministic would be good - we can create a ticket - kind of low priority right now https://github.com/orgs/Syndica/projects/2/views/1?pane=issue&itemId=67168458 |
New failing seed caught in CI: |
https://github.com/Syndica/sig/actions/runs/9897406056/job/27341819918#step:4:1 We got another one: '1720726415143' |
- changed the imports across various files to match the current approach (sig.core.account; not @import) - remove excess path naming of tests (test "bloom.bloom: ..." -> test "...") - move `getWallclockMs` into `sig.time` (doesnt make sense to be in gossip) - add seed parameter to `buildMessages` which is used for all random operations (prev seed was wallclock) - add `seed` parameter to `PullRequestTask` (instead of using wallclock) - add `now` parameter to `buildPullRequests` for more deterministic results (prev was reading wallclock) - modifications to `test "test build pull requests"` to be a more deterministic test note: the last two changes were aimed to make `test "test build pull requests"` more deterministic and fix #157 -- ive re-run the tests many times and it seems to be fixed - remove unnecessary log lines in gossip - add additional context to readme.md
This is still a problem. I saw a failure on the latest commit for seed 1725458185024. See my comment above for a list I'll keep updating as I see more failures. |
Description
At random, this line in the referenced test will fail:
sig/src/gossip/pull_response.zig
Line 143 in a5d1f34
I currently do not have a stacktrace demonstrating this behavior, because I do not know the circumstances under which this failure occurs.
How to Reproduce the Bug
run
zig build test -Dfilter="gossip.pull_response: test filtering values works"
and prayAdditional Context
It should be noted that the function being tested, named
filterSignedGossipDatas
, currently instantiates and uses a PRNG seeded withstd.time.milliTimeStamp()
, which could explain its seemingly random and unreproducible behavior.What would be ideal is capturing a stack trace of this occurring, as well as the exact or approximate timestamp at which this failure occurs, to diagnose what random values can cause such a failure, and identify what circumstances they lead to which must be guarded against.
At the time of writing I plan on pushing a change which would help in the above goals, without needing to actively search for this mythical seed.
The text was updated successfully, but these errors were encountered: