feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

diegomrsantos · 2024-04-09T21:47:08Z

This PR applies a behavior penalty to peers whose non-prio queue reaches the max limit configured, instead of the previous strategy of disconnecting the peer. A conservative penalty of 0.0001 is added to behaviourPenalty for each message tried to be sent when the queue is over the limit, and the message is discarded. This usually results in a behaviourPenalty around [0.1, 0.2] when the score is updated and its value around [-0.4, -0.1].

It causes the peer to be pruned due to its negative score. This PR also clears the non-prio queue at this moment.

codecov-commenter · 2024-04-09T23:01:45Z

Codecov Report

Attention: Patch coverage is 63.63636% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.60%. Comparing base (03f67d3) to head (a6237bd).
Report is 2 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1083      +/-   ##
============================================
- Coverage     84.82%   84.60%   -0.23%     
============================================
  Files            91       91              
  Lines         15428    15486      +58     
============================================
+ Hits          13087    13102      +15     
- Misses         2341     2384      +43

Files	Coverage Δ
libp2p/protocols/pubsub/gossipsub/behavior.nim	`88.86% <0.00%> (+0.02%)`	⬆️
libp2p/protocols/pubsub/pubsubpeer.nim	`87.59% <70.00%> (+1.45%)`	⬆️

... and 9 files with indirect coverage changes

diegomrsantos · 2024-04-10T15:19:03Z

This log shows the moment when the peer 16U*23VggQ non-prio queue reaches maxNumElementsInNonPriorityQueue (currently 1024) at 2024-04-10 15:35:48.273+02:00. At this moment, a behavior penalty of 0.0001 is applied. It keeps increasing by the same amount until it reaches 0.08230000000000133. At 2024-04-10 15:35:57.562+02:00, the peer score becomes negative (-0.1079571840000035). The peer is pruned and the queue is cleared. At 2024-04-10 16:05:57.471 the peer score becomes 0 again. After that, something similar happens again, but this time, after the peer score becomes negative, it's disconnected DBG 2024-04-10 16:15:14.749+02:00 Received Goodbye message topics="peer_proto" reason="Disconnected (129)" peer=16U*23VggQ. The agent was prysm and I don't know if it's different from teku, but in the latter, this error means too many peers.
nimbus-log.txt

AlejandroCabeza · 2024-04-29T15:13:24Z

libp2p/protocols/pubsub/pubsubpeer.nim

-        p.closeSendConn(PubSubPeerEventKind.DisconnectionRequested)
-      else:
-        Future[void].completed()
+      p.behaviourPenalty += 0.0001


I'd suggest moving the value used here to a constant.

AlejandroCabeza · 2024-04-29T15:16:07Z

libp2p/protocols/pubsub/pubsubpeer.nim

+      p.behaviourPenalty += 0.0001
+      trace "Peer has reached maxNumElementsInNonPriorityQueue. Discarding message and applying behaviour penalty.", peer = p, score = p.score,
+        behaviourPenalty = p.behaviourPenalty, agent = p.getAgent()
+      Future[void].completed()


Given most return paths are Future[void].completed() how about moving it outside the if statement and doing an explicit return on each arm for the cases that are not Future[void].completed().

could you please elaborate on what you believe is bad with the current approach?

There's nothing wrong per se. What I was hinting at was, when you have code akin to this:

if <cond>: foo() return A elif <cond>: bar() return A elif <cond>: baz() return B ... else: foobar() return A

Sometimes it's clearer to group/extract the returns:

if <cond>: foo() elif <cond>: bar() elif <cond>: baz() return B ... else: foobar() return A

But I just realised here it's almost 50/50, so not a lot of improvement can be made.

AlejandroCabeza · 2024-04-29T15:18:29Z

libp2p/protocols/pubsub/pubsubpeer.nim

+proc clearNonPriorityQueue*(p: PubSubPeer) =
+  if len(p.rpcmessagequeue.nonPriorityQueue) > 0:
+    p.rpcmessagequeue.nonPriorityQueue.clear()
+
+    when defined(pubsubpeer_queue_metrics):
+      libp2p_gossipsub_non_priority_queue_size.set(labelValues = [$p.peerId], value = 0)
+


How asynchronous is this operation? That is: How feasible it is that, between the clear call and the metrics update, some value is added to the nonPriorityQueue? 🤔

This proc isn't async and won't be suspended.

Oh, that's right. nvm then

diegomrsantos requested a review from arnetheduck April 9, 2024 21:47

diegomrsantos added 2 commits April 10, 2024 00:40

apply behaviour penalty

d9a60f3

clear non priority queue when score is negative

a6237bd

diegomrsantos force-pushed the penalty-non-priority-queue branch from b21e90f to a6237bd Compare April 9, 2024 22:40

diegomrsantos added 2 commits April 10, 2024 08:35

apply penalty even if stream is closed

0c61f12

set metric to zero when cleaning non prio queue

917dbf8

diegomrsantos marked this pull request as ready for review April 11, 2024 15:46

diegomrsantos mentioned this pull request Apr 20, 2024

GossipSub: improve flood publish #854

Open

Merge branch 'unstable' into penalty-non-priority-queue

87110e5

kaiserd requested review from lchenut and AlejandroCabeza April 29, 2024 14:22

AlejandroCabeza reviewed Apr 29, 2024

View reviewed changes

add constant

8014a84

kaiserd self-requested a review May 10, 2024 13:36

Merge branch 'master' into penalty-non-priority-queue

47990be

AlejandroCabeza approved these changes May 17, 2024

View reviewed changes

Merge branch 'master' into penalty-non-priority-queue

a16293b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

diegomrsantos commented Apr 9, 2024 •

edited

Loading

codecov-commenter commented Apr 9, 2024

diegomrsantos commented Apr 10, 2024 •

edited

Loading

AlejandroCabeza Apr 29, 2024

diegomrsantos Apr 29, 2024

AlejandroCabeza Apr 29, 2024

diegomrsantos Apr 29, 2024

AlejandroCabeza May 17, 2024

AlejandroCabeza Apr 29, 2024

diegomrsantos Apr 29, 2024

AlejandroCabeza May 17, 2024

feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

Are you sure you want to change the base?

feat: behaviour penalty when non-priority queue reaches maxNumElementsInNonPriorityQueue #1083

Conversation

diegomrsantos commented Apr 9, 2024 • edited Loading

codecov-commenter commented Apr 9, 2024

Codecov Report

diegomrsantos commented Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

diegomrsantos commented Apr 9, 2024 •

edited

Loading

diegomrsantos commented Apr 10, 2024 •

edited

Loading