WIP txHandler: do not rebroadcast to peers sent duplicate messages #5424

algorandskiy · 2023-05-26T21:13:00Z

Summary

When the transaction deduplication was added it had a side effect of re-sending transactions to peers sent a message the handler has seen before. This PR fixes this.

Before dedup if A sent T, all but A get T, then, if B sent us T, noone gets T again
With dedup, if A sent T, all but A get T, then, if B sent us T, noone gets T again.
With this PR: if A and B sent T almost simultaneously all but A and B get T

Implementation overview:

txSaltedCache how stores map of seen peers for a particular message
CheckAndPut returns this map to to get rid additional lookups.
The map is later used in RelayArray to let networking know whom to skip.

Implementation Details

CheckAndPut is not more complex because it needs to update the map value even if there is a match.
The original idea with fast check under a read lock is preserved but innerCheck also returns a current value and a "page" (cur/prev map) where it was found in order to update. Note innerCheck can return prev page and this also need to be considered when running an update with a write lock taken.
The found && senderFound denotes a new fast path without modification underlying maps.

Note, a reference data struct (*sync.Map in this case) is crucial to have the implementation to work because of the following scenario:

Received txn A from N1, wrote to the cache, the the cache value (peers list) is attached to a work item txBacklogMsg (wi ) of this transaction.
Received txn A from N2, updated the peers list. The reference in wi is updated automatically.
It is time to relay, give the network the same peers list
Received txn A from N3, updated to the cache value (peers list)
The network has updated peers list with N3
After broadcasting received txn A from N4, updated to the cache value (peers list), but b/c it is a duplicate it would not make its path to the network.

Test Plan

Added a new unit test

Benchmarks

master

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-8         	 1000000	      1575 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-8         	  769404	      1886 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-8         	 1347638	       992.8 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-8         	  777824	      2042 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-8        	 1248975	       984.3 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-8        	  709402	      2234 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-8       	 1233063	       996.1 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-8       	  619723	      2289 ns/op

feature

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-8         	  978464	      1460 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-8         	  575578	      2303 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=1-8      	  550164	      2238 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=4-8         	 1395024	       877.9 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-8         	  499202	      2773 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=4-8      	  651159	      1883 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=16-8        	 1308243	       948.8 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-8        	  430381	      3130 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=16-8     	  954093	      1338 ns/op
BenchmarkDigestCaches/data.digestCacheMaker/threads=128-8       	 1305469	       952.8 ns/op
BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-8       	  443257	      3026 ns/op
BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=128-8    	 1756513	       686.5 ns/op

codecov · 2023-05-26T23:59:02Z

Codecov Report

Attention: Patch coverage is 69.49153% with 18 lines in your changes missing coverage. Please review.

Project coverage is 50.43%. Comparing base (5ff0c22) to head (2fa4046).
Report is 511 commits behind head on master.

Files with missing lines	Patch %	Lines
data/txDupCache.go	79.48%	7 Missing and 1 partial ⚠️
network/wsNetwork.go	38.46%	6 Missing and 2 partials ⚠️
data/txHandler.go	71.42%	1 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (5ff0c22) and HEAD (2fa4046). Click for more details.

HEAD has 42 uploads less than BASE

Flag BASE (5ff0c22) HEAD (2fa4046)

64 22

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5424      +/-   ##
==========================================
- Coverage   55.60%   50.43%   -5.18%     
==========================================
  Files         447      447              
  Lines       63395    63422      +27     
==========================================
- Hits        35253    31986    -3267     
- Misses      25760    28931    +3171     
- Partials     2382     2505     +123

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

data/txDupCache.go

data/txDupCache_test.go

data/txHandler.go

AlgoAxel

One opportunity to use Eventually, but up to you if you take it.

data/txHandler_test.go

data/txDupCache.go

algorandskiy · 2023-06-06T15:06:03Z

Fixed outstanding review comments and merged master in

algonautshant · 2023-06-06T17:01:34Z

data/txDupCache.go

+		vals, found = c.cur[*d]
+		if found {
+			if _, senderFound = vals.Load(sender); senderFound {
+				return d, vals, true


missing a test for this case?

it is hard to test since the value need to appear in a current page between read write locks

algonautshant

Looks good.

Many more locks, map lookups, code complexity and memory footprint added for possibly very limited gain.
Will be nice to evaluate this tradeoff thoroughly and be prepare to reverse this if it is not sufficiently helpful.

algorandskiy · 2023-06-07T00:24:12Z

A current delay between accepting and and relaying is:

Hashing + cache access
Decoding
Signature verification within 2ms batch
txpool Check + Remember delay

I.e. there is some delay but might not be enough to collect enough duplicates to make re-broadcasting filtering effective.
TODO, as a follow up PR: benchmark and and implement ~50ms (?) delay before re-broadcasting in order to collect as much duplicates as possible without introducing too much tx latency.

algorandskiy · 2023-06-09T02:16:44Z

Do not merge - the benchmark showed 6.9k tps vs 7.7k tps on master, so networking code appears to be much slower now.

cce · 2023-06-09T15:18:26Z

Given the performance impact would it be possible to have this behavior be optional? Like keep the original txSaltedCache with the old struct{} value type, and then a new txPeerTrackingCache type that could be optionally enabled?

data/txDupCache.go

cce · 2023-06-09T15:58:58Z

data/txDupCache_test.go

@@ -315,7 +315,7 @@ func (p *digestCachePusher) push() {
 func (p *saltedCachePusher) push() {
 	var d [crypto.DigestSize]byte
 	crypto.RandBytes(d[:])
-	p.c.CheckAndPut(d[:]) // saltedCache hashes inside
+	p.c.CheckAndPut(d[:], struct{}{}) // saltedCache hashes inside


I didn't realize this before, but your benchmark is only checking the raw performance of totally unique txns, with no duplicates, which is a synthetic scenario good for performance comparison but the real workload will probably have at least a few duplicates (and peers) per digest

yes, updated the benchmark and posted results

maybe give sync.Map a pointer to an object rather than struct{}{}? I can't imagine how it will convert that to a map key

cce · 2023-06-09T16:00:13Z

data/txDupCache.go

 	if found {
-		return d, found
+		if _, senderFound = vals.Load(sender); senderFound {


Is it necessary to have all these checks for whether you've seen the sender already? Or can we just optimistically call Store again for that case? it seems like this would simplify the code a bit, e.g.

func (c *txSaltedCache) CheckAndPut(msg []byte, sender network.Peer) (d *crypto.Digest, vals *sync.Map, found bool) { c.mu.RLock() d, vals, _, found = c.innerCheck(msg) c.mu.RUnlock() salt := c.curSalt // fast read-only path: assuming most messages are duplicates, hash msg and check cache if found { vals.Store(sender, struct{}{}) // record the sender for this txn return d, vals, true } // not found: acquire write lock to add this msg hash to cache c.mu.Lock() defer c.mu.Unlock() // salt may have changed between RUnlock() and Lock(), rehash if needed if salt != c.curSalt { d, vals, _, found = c.innerCheck(msg) if found { // already added to cache between RUnlock() and Lock(), return vals.Store(sender, struct{}{}) // record the sender for this txn return d, vals, true } } else { // not found or found in cur page // Do another check to see if another copy of the transaction won the race to write it to the cache // Only check current to save a lookup since swap is handled in the first branch vals, found = c.cur[*d] if found { vals.Store(sender, struct{}{}) // record the sender for this txn return d, vals, true } } if len(c.cur) >= c.maxSize { c.innerSwap() ptr := saltedPool.Get() defer saltedPool.Put(ptr) buf := ptr.([]byte) toBeHashed := append(buf[:0], msg...) toBeHashed = append(toBeHashed, c.curSalt[:]...) toBeHashed = toBeHashed[:len(msg)+len(c.curSalt)] dn := crypto.Digest(blake2b.Sum256(toBeHashed)) d = &dn } vals = &sync.Map{} vals.Store(sender, struct{}{}) // record the sender for this txn c.cur[*d] = vals return d, vals, false }

I'll benchmark this version vs Load + Store I have

BenchmarkDigestCaches/data.digestCacheMaker/threads=1-8 944719 1435 ns/op BenchmarkDigestCaches/data.saltedCacheMaker/threads=1-8 585506 2342 ns/op BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=1-8 568954 2254 ns/op BenchmarkDigestCaches/data.digestCacheMaker/threads=4-8 1327412 893.9 ns/op BenchmarkDigestCaches/data.saltedCacheMaker/threads=4-8 495157 2733 ns/op BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=4-8 627554 2041 ns/op BenchmarkDigestCaches/data.digestCacheMaker/threads=16-8 1287717 1023 ns/op BenchmarkDigestCaches/data.saltedCacheMaker/threads=16-8 382932 3128 ns/op BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=16-8 981458 1280 ns/op BenchmarkDigestCaches/data.digestCacheMaker/threads=128-8 1307422 988.9 ns/op BenchmarkDigestCaches/data.saltedCacheMaker/threads=128-8 406440 3055 ns/op BenchmarkDigestCaches/data.saltedCacheDupMaker/threads=128-8 1475484 787.5 ns/op

idk, kind of the same.

The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.

It appears to be a sharded map hashmap, and according to (1) it is better to read rather then rewrite

My thinking was, the only reason to do the Load() before Store() is to optimize for the case where the same peer gives you same the transaction multiple times, which seems unlikely

It's definitely not the case that "multiple goroutines read, write, and overwrite entries for disjoint sets of keys" in your benchmarks.

However in practice, assuming every peer sends the same txn once, and there are 20 handlers that are randomly assigned to the txns from different peers, this is more likely to be true.

sync.Map.Store actually calls Load at the very beginning so in makes sense to call Store directly.
But anyway, allocation new sync.Map per new txn appears to be a main contributor to the TPS slowdown.

algorandskiy added 3 commits May 26, 2023 12:22

separate digestCacheData

d48cb17

store senders in a cache

973dcb2

make RelayArray accept map of peers

f07f0b9

algorandskiy added Enhancement Team Carbon-11 labels May 26, 2023

algorandskiy requested review from cce, iansuvak and AlgoAxel May 26, 2023 21:13

algorandskiy self-assigned this May 26, 2023

algorandskiy added 4 commits May 26, 2023 17:30

fix linter

69b4e25

Add RelayArray to mock to fix tests

5a9506c

fix linter

2cc0b75

fix data race

93dddb2

iansuvak reviewed May 31, 2023

View reviewed changes

data/txDupCache.go Outdated Show resolved Hide resolved

WIP test

19bb0bb

AlgoAxel reviewed Jun 1, 2023

View reviewed changes

data/txDupCache.go Outdated Show resolved Hide resolved

data/txDupCache.go Show resolved Hide resolved

data/txDupCache_test.go Outdated Show resolved Hide resolved

iansuvak reviewed Jun 1, 2023

View reviewed changes

data/txHandler.go Show resolved Hide resolved

algorandskiy added 4 commits June 1, 2023 20:10

Add no dups relaying test

57b7654

switch from map + copy to a pointer to sync.Map

676c145

fix tests

ad563e5

use c.prev as a alloc size guidance

f882340

algorandskiy requested review from iansuvak and algonautshant June 2, 2023 15:44

algorandskiy added 2 commits June 2, 2023 11:57

fix lint

440b63b

Merge remote-tracking branch 'upstream/master' into pavel/dup-relays-fix

01ac24b

AlgoAxel previously approved these changes Jun 2, 2023

View reviewed changes

data/txHandler_test.go Outdated Show resolved Hide resolved

data/txHandler_test.go Outdated Show resolved Hide resolved

cce reviewed Jun 2, 2023

View reviewed changes

data/txDupCache.go Outdated Show resolved Hide resolved

CR fixes: remove extra data types

9230403

algorandskiy dismissed AlgoAxel’s stale review via 9230403 June 5, 2023 12:34

Merge remote-tracking branch 'upstream/master' into pavel/dup-relays-fix

a9da829

algorandskiy dismissed stale reviews from AlgoAxel and iansuvak via a9da829 June 6, 2023 15:02

CR feedback: use eventually

366b7a1

algorandskiy requested review from algonautshant and AlgoAxel June 6, 2023 15:06

algonautshant reviewed Jun 6, 2023

View reviewed changes

algonautshant previously approved these changes Jun 6, 2023

View reviewed changes

iansuvak previously approved these changes Jun 6, 2023

View reviewed changes

algorandskiy marked this pull request as draft June 9, 2023 02:15

algorandskiy changed the title ~~txHandler: do not rebroadcast to peers sent duplicate messages~~ WIP txHandler: do not rebroadcast to peers sent duplicate messages Jun 9, 2023

Merge remote-tracking branch 'upstream/master' into pavel/dup-relays-fix

ce6e517

cce reviewed Jun 9, 2023

View reviewed changes

data/txDupCache.go Outdated Show resolved Hide resolved

duplicates + unique senders benchmark

6b27c17

cce reviewed Jun 9, 2023

View reviewed changes

algorandskiy added 2 commits June 9, 2023 12:08

refactor locks

1685b49

remove page from innerCheck

bf3bad5

algorandskiy dismissed stale reviews from iansuvak and algonautshant via bf3bad5 June 9, 2023 16:11

algorandskiy added 3 commits June 9, 2023 19:02

use LoadOrStore

5129bd0

add ignore metric

e462b90

Add except and exceptMany to reduce allocations

2fa4046

Eric-Warehime mentioned this pull request Aug 4, 2023

network: make GossipNode more independent from wsNetwork implementation #5634

Merged

jsgranados added operations and removed operations labels Aug 15, 2023

WIP txHandler: do not rebroadcast to peers sent duplicate messages #5424

Are you sure you want to change the base?

WIP txHandler: do not rebroadcast to peers sent duplicate messages #5424

Uh oh!

Conversation

algorandskiy commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation overview:

Implementation Details

Test Plan

Benchmarks

Uh oh!

codecov bot commented May 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlgoAxel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

algorandskiy commented Jun 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

algonautshant left a comment

Choose a reason for hiding this comment

Uh oh!

algorandskiy commented Jun 7, 2023

Uh oh!

algorandskiy commented Jun 9, 2023

Uh oh!

cce commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

algorandskiy commented May 26, 2023 •

edited

Loading

codecov bot commented May 26, 2023 •

edited

Loading

cce commented Jun 9, 2023 •

edited

Loading