-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: flaky tests fixes #5487
tests: flaky tests fixes #5487
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5487 +/- ##
==========================================
+ Coverage 55.41% 55.70% +0.29%
==========================================
Files 446 446
Lines 63289 63289
==========================================
+ Hits 35069 35253 +184
+ Misses 25826 25664 -162
+ Partials 2394 2372 -22 see 37 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
tools/network/dnssec/sort_test.go
Outdated
} | ||
retryCounter++ | ||
goto retry | ||
} | ||
a.Equal(net.SRV{Priority: 1, Weight: 200}, *arr[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error seems to be here, in faulty expectations about who the first value of a randomized sort is going to be, rather than in not retrying it enough times? I don't understand the point of this test or the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but most of the cases (202/204) we expect this entry to be at the very beginning of the randomized list according to its weight. Happy to change to better scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe set the weight to a million? and always run it ten times, and assert the large weight is only out of place at most once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should likely be rewritten with Rapid at some point - can generate the values with appropriate constraints defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe run one array with just with weights 2 and 1 and run it a hundred times and put sigma expectation bars around the count of the unique orderings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want waste time on this minor test with fancy non-obvious conditions. I guess having a w=1M and re-running it once is good enough. I'll update to reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, weight is 16 bit, well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 attempts with max uint16
tools/network/dnssec/sort_test.go
Outdated
srvRecArray(arr).sortAndRand() | ||
a.Equal(net.SRV{Priority: 1, Weight: 200}, *arr[0]) | ||
if (*arr[0] != net.SRV{Priority: 1, Weight: 0xFFFF}) { | ||
// there is a small change that a random number from 0 to max uint15 would be 0 or 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to approve in the interest of fixing tests, but why not just limit the range of the random number to be greater than 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec requires to pick a random number from 0 to sum of all weights to be ordered... I guess it is better to be more-less spec compliant.
@@ -629,6 +632,32 @@ func TestLedgerErrorValidate(t *testing.T) { | |||
case <-expectedMessages: | |||
// only debug messages should be reported | |||
case um := <-unexpectedMessages: | |||
if strings.Contains(um, "before dbRound") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ledger_test.go:632:
Error Trace: /Users/shantkarakashian/go/src/github.com/algorand/go-algorand/data/ledger_test.go:632
Error: Should be empty, but was data.EnsureBlock: could not write block 1926 to the ledger: round 1925 before dbRound 1933
Test: TestLedgerErrorValidate
Messages: data.EnsureBlock: {C} could not write block 1926 to the ledger: round 1925 before dbRound 1933
--- FAIL: TestLedgerErrorValidate (0.27s)
ledger_test.go:628:
Error Trace: /Users/shantkarakashian/go/src/github.com/algorand/go-algorand/data/ledger_test.go:628
Error: Received unexpected error:
round 0 before dbRound 2
Test: TestLedgerErrorValidate
First is coming from data/ledger.go:
363 case ledgercore.BlockInLedgerError:
364 // The block is already in the ledger. Catchup and agreement could be competing
365 // It is sufficient to report this as a Debug message
366 l.log.Debugf("data.EnsureBlock: {B} could not write block %d to the ledger: %v", round, err)
367 return
368 default:
369 l.log.Errorf("data.EnsureBlock: {C} could not write block %d to the ledger: %v", round, err)
370 }
371
372 // If there was an error add a short delay before the next attempt.
373 time.Sleep(100 * time.Millisecond)
The second from acctupdates.go:
1403// lookupWithoutRewards returns the account data for a given address at a given round.
1404func (au *accountUpdates) lookupWithoutRewards(rnd basics.Round, addr basics.Address, synchronized bool) (data ledgercore.AccountData, validThrough basics.Round\
, rewardsVersion protocol.ConsensusVersion, rewardsLevel uint64, err error) {
1405 needUnlock := false
1406 if synchronized {
1407 au.accountsMu.RLock()
1408 needUnlock = true
1409 }
1410 defer func() {
1411 if needUnlock {
1412 au.accountsMu.RUnlock()
1413 }
1414 }()
1415 var offset uint64
1416 var persistedData trackerdb.PersistedAccountData
1417 for {
1418 currentDbRound := au.cachedDBRound
1419 currentDeltaLen := len(au.deltas)
1420 offset, err = au.roundOffset(rnd, "{F}")
1421 if err != nil {
1422 return
1423 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two places I could reproduce that send messages which break the test.
Are both of these legitimate? If so, the filter below should take care of the second case as well.
Note: {B} {C} {F}
annotations are not in the code, I added them to track down where these messages are originated from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
round 0 before dbRound 2
Handled this as well. This is from Lookup path that is part of block evaluation. The first one handled came from EnsureBlock and the second from AddBlock but because of the same reason - while AddBlock
goroutine was suspended, the EnsureValidatedBlock
add 6+ blocks so that ledger committed account changed and trackers advanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we uncover another error...
ledger_test.go:677:
Error Trace: /Users/shantkarakashian/go/src/github.com/algorand/go-algorand/data/ledger_test.go:677
Error: Should be empty, but was failed to delete hash '000001830087367e5228afa552f1eaf821f979a8352e527bb1e8119b761ae4598d5a5bf8' from merkle trie for account 7777777777777777777777777777777777777777777777777774MSJUVU
Test: TestLedgerErrorValidate
Messages: failed to delete hash '000001830087367e5228afa552f1eaf821f979a8352e527bb1e8119b761ae4598d5a5bf8' from merkle trie for account 7777777777777777777777777777777777777777777777777774MSJUVU
coming from catchpointtracker.go:
952// accountsUpdateBalances applies the given compactAccountDeltas to the merkle trie
953func (ct *catchpointTracker) accountsUpdateBalances(accountsDeltas compactAccountDeltas, resourcesDeltas compactResourcesDeltas, kvDeltas map[string]modifiedKvV$
954 if !ct.catchpointEnabled() {
955 return nil
956 }
957 accumulatedChanges := 0
958
959 for i := 0; i < accountsDeltas.len(); i++ {
960 delta := accountsDeltas.getByIdx(i)
961 if !delta.oldAcct.AccountData.IsEmpty() {
962 deleteHash := trackerdb.AccountHashBuilderV6(delta.address, &delta.oldAcct.AccountData, protocol.Encode(&delta.oldAcct.AccountData))
963 deleted, err := ct.balancesTrie.Delete(deleteHash)
964 if err != nil {
965 return fmt.Errorf("failed to delete hash '%s' from merkle trie for account %v: %w", hex.EncodeToString(deleteHash), delta.addres$
966 }
967 if !deleted {
968 ct.log.Errorf("failed to delete hash '%s' from merkle trie for account %v", hex.EncodeToString(deleteHash), delta.address)
969 } else {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the issues.
Looks good. One more failure detected, but it is extremely rare.
The cause of the database error needs consideration outside the contest of this test.
Summary
libsodium-fork/libsodium-fork
nested directory for example in this path:./crypto/copies/darwin/amd64/libsodium-fork/libsodium-fork/src/libsodium/crypto_sign/ed25519/ref10
. This prevents from making changes in libsodium sources and compiling it with make likemake crypto/libs/darwin/amd64/lib/libsodium.a
.Test Plan
go test ./tools/network/dnssec -run TestSrvSort -count 100
and gone after the fix.