Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gossipsub extension for Epidemic Meshes (v1.2.0) #413

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions pubsub/gossipsub/gossipsub-v2.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Episub - Gossipsub Extension


# Overview

This document aims to provide a minimal extension to the [gossipsub v1.1](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md) protocol, that supersedes the previous [episub](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/episub.md) proposal.

The proposed extensions are backwards-compatible and aim to enhance the efficiency (minimize amplification/duplicates and decrease message latency) of the gossip mesh networks by dynamically adjusting the messages sent by mesh peers based on a local view of message duplication and latency.

In more specific terms, two new control messages are introduced, `CHOKE` and `UNCHOKE`. When a Gossipsub router is receiving many duplicates on a particular mesh, it can send a `CHOKE` message to it's mesh peers that are sending duplicates slower than its fellow mesh peers. Upon receiving a `CHOKE` message, a peer is informed to no longer propagate mesh messages to the sender of the `CHOKE` message, rather lazily (in every heartbeat) send it's gossip.
Copy link
Contributor

@vyzo vyzo May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make the conditions more explicit once we have more data.
Also, messages originating in the choked node will still have to be directly sent, we should probably mention this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed


A Gossipsub router may notice that it is receiving messages via gossip from it's `CHOKE`'d peers faster than it receives them via the mesh. In this case it may send an `UNCHOKE` message to the peer to inform the peer to resume propagating messages in the mesh.
The router may also notice that it is receiving messages from gossip from peers not in the mesh (fanout) faster than it receives messages in the mesh. It may then add these peers into the mesh.

The modifications outlined above intend to optimize the Gossipsub mesh to receive minimal duplicates from peers with the lowest latency.

# Specification

## Protocol Id

Nodes that support this Gossipsub extension should additionally advertise the version number `2.0.0`. Gossipsub nodes can advertise their own protocol-id prefix, by default this is `meshsub` giving the default protocol id:
AgeManning marked this conversation as resolved.
Show resolved Hide resolved
- `/meshsub/2.0.0`

## Parameters

This section lists the configuration parameters that control the behaviour of the mesh optimisation.


| Parameter | Description | Reasonable Default |
| -------- | -------- | -------- |
| `D_non_choke` | The minimum number of peers in a mesh that must remain unchoked. | `D_lo` |
| `choke_heartbeat_interval` | The number of heartbeats before assessing and applying `CHOKE`/`UNCHOKE` control messages and adding `D_max_add`. | 20 |
| `D_max_add` | The maximum number of peers to add into the mesh (from fanout) if they are performing well per `choke_heartbeat_interval`. | 1 |
AgeManning marked this conversation as resolved.
Show resolved Hide resolved
| `choke_duplicates_threshold` | The minimum number of duplicates as a percentage of received messages a peer must send before being eligible of being `CHOKE`'d. | 60 |
| `choke_churn` | The maximum number of peers that can be `CHOKE`'d or `UNCHOKE`'d in any `choke_heartbeat_interval`. | 2 |
|` unchoke_threshold` | Determines how aggressively we unchoke peers. The percentage of messages that we receive in the `choke_heartbeat_interval` that were received by gossip from a choked peer. | 50 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this be higher than choke_threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont have a feel for this.

They are measuring different things. I think the duplicates we get from a specific node is fairly dependent on the topology of the network. A triangle-like network would see more duplicates from specific peers (I think).

Once choked, there is a race condition between how fast we get messages from the mesh (which is like a single round trip) vs the request/response from an IWANT (which I think would be more correlated with latency of the node). I don't see a direct relationship between these thresholds or a feel for which should be higher than the other.
You probably have a better intuition than I do, is there something obvious I'm missing that should make this higher than choke_threshold?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, you are right, this is not clear what these values should be, lets look at some data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you don’t mind if I add some considerations.

In general, and without choking, each message travels at least once on each mesh link. It would be exactly one if there would not be propagation delay, but there is, and it will become even bigger with large messages.

If node A receives a duplicate from B, that also means that:

  • A has already received from someone else (otherwise this wouldn’t be a duplicate) and was queuing the message for sending to B
  • B did receive from someone else and queued for sending to B before getting the above from A (otherwise there would be no duplicate)

In other words, this relation is symmetric. If A is receiving a duplicate from B, then B will also receive a duplicate from A. At least I think no implementations aborts the sending of queued messages once there is a message from the other side. They mostly can’t even do it because queuing is in-network. This might be an issue for the absolute percentage based choke_duplicates_threshold’, because the peers would race to choke each other. 

But you are speaking of “largest average latency” in the selection. What latency do you mean here?


As for the ‘unchoke_threshold’: There are I think two things to look at when evaluating who to unchoke:


  1. whether they would provide chunks we would otherwise miss

  2. whether they would improve our latency.

The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs.

Copy link
Contributor Author

@AgeManning AgeManning May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cskiraly - Thanks for the extra considerations.

In regards to the sources of duplicates, I think its a little tricky and I'm not sure the relation is symmetric. I think there are three main reasons we see duplicates:

  • Propagation delay (as you've mentioned) - Two nodes could send to each other because of propagation delay.
  • Validation delay - Depending on the application, a node could receive a message, and take a while to validate it before propagating it in which time others could send it duplicates before it gets a chance to forward it.
  • Topology - This one is the more tricky one to reason about imo, because there's lots of ways nodes can be connected and depending on the network the publisher can be random. But let me try come up with a simple example:

Consider 4 nodes:

                A
           |         |
           B         C
                 |
                 D

A is connected to B and C who in turn are connected to D. Lets say A constantly publishes, it sends to B and C. B and C will then send to D. Lets say on average one path is faster, A->B->D. D will receive a message from B before C. Depending on propagation speed and validation time etc it will then send to C. If this timing is consistently faster than A-> C and C's validation time, then C will be constantly receiving duplicates from D (and should not respond by sending a message back, making this not symmetric).
The aim in this proposal is for C to choke D (in this example).

What latency do you mean here?

I was thinking that for all the duplicates we see, we measure the time it took for it to reach us measured relative to the first message sent to us. I assume this will create a somewhat normal distribution of timing amongst our peers. If a few peers are eligible for choking we preference them by the average time of messages they sent to us. i.e if most of the time one peer is sending us a duplicate 700ms later than the first message, when another is sending us duplicates 100ms later, we choke the one sending 700ms later on average.

The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs

Yes. I agree. IHAVE's are very slow. They are stochastic also, in that we randomly select peers to send them to in the heartbeat. The speed is very much related to heartbeat timing and gossip_lazy parameters.
When we get choked a peer, the idea is to always send IHAVE messages to the peer that choked us in the heartbeat. That way, if the peer that choked us, has malicious or slow nodes, and its source of messages are coming from IWANT based on our IHAVE's, then they know this is an error and must unchoke us. The issue, as you've pointed out, is that if the gossip (IHAVE/IWANT) system is going to be fast enough to beat the mesh in a statistically meaningful way to know if we should unchoke.
The complications are that the speed of the gossip is very much parameter dependent. We could make the heartbeats very fast, like 10's of ms and I would imagine it would be fine, but I dont have a good feel if we can use second-level heartbeats to also achieve appropriate unchoking.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cskiraly - Thanks for the extra considerations.

In regards to the sources of duplicates, I think its a little tricky and I'm not sure the relation is symmetric. I think there are three main reasons we see duplicates:

  • Propagation delay (as you've mentioned) - Two nodes could send to each other because of propagation delay.
  • Validation delay - Depending on the application, a node could receive a message, and take a while to validate it before propagating it in which time others could send it duplicates before it gets a chance to forward it.
  • Topology - This one is the more tricky one to reason about imo, because there's lots of ways nodes can be connected and depending on the network the publisher can be random. But let me try come up with a simple example:

Right, I've forgot about validation delay. It is indeed there, but I had a reason to forget about it. We have added duplicate metrics in nim-libp2p and thus in Nimbus, and it shows that in that use case, the duplicates during validation are negligible compared to all duplicates. Here is an example.

image

I'm happy to provide more data on this.

Now this is just one use case of gossipsub, and in general we should consider the validation delay as you've rightly pointed out.

Consider 4 nodes:

                A
           |         |
           B         C
                 |
                 D

A is connected to B and C who in turn are connected to D. Lets say A constantly publishes, it sends to B and C. B and C will then send to D. Lets say on average one path is faster, A->B->D. D will receive a message from B before C. Depending on propagation speed and validation time etc it will then send to C. If this timing is consistently faster than A-> C and C's validation time, then C will be constantly receiving duplicates from D (and should not respond by sending a message back, making this not symmetric).

I see, but this really depends on the assumption that validation time is long enough to create such a time window. And data as above tells me it is much smaller than network induced delays, so you mostly end up with symmetry.

The aim in this proposal is for C to choke D (in this example).

Right. I'm not sure how this symmetry plays out, just raised it as something that might modify the actual effect compared to what we expect.

What latency do you mean here?

I was thinking that for all the duplicates we see, we measure the time it took for it to reach us measured relative to the first message sent to us. I assume this will create a somewhat normal distribution of timing amongst our peers. If a few peers are eligible for choking we preference them by the average time of messages they sent to us. i.e if most of the time one peer is sending us a duplicate 700ms later than the first message, when another is sending us duplicates 100ms later, we choke the one sending 700ms later on average.

That's nice! I wasn't thinking of that delay distribution. Now say your 700ms peer is sending you first copies on 30% of chunks, and has long delay on 70%. The 100ms peer is sending you only duplicates, but it sends those with a 100ms average. Which one would you choke? I think I would go with the 100ms one. In effect I think we need another distribution, one that also adds with negative weight the cases when we receive first. This weight could be the time between the first and the 2nd receive. Or even that boosted, as 1st receive is very important compared to all the duplicates.

The difficulty is that IHAVE messages provide a largely delayed view compared to actual message reception. Maybe looking at the freshest messages only in IHAVEs could be an indicator for the second. For the first, one could look at the IHAVEs received, or the messages received as a consequence of IWANTs

Yes. I agree. IHAVE's are very slow. They are stochastic also, in that we randomly select peers to send them to in the heartbeat. The speed is very much related to heartbeat timing and gossip_lazy parameters. When we get choked a peer, the idea is to always send IHAVE messages to the peer that choked us in the heartbeat. That way, if the peer that choked us, has malicious or slow nodes, and its source of messages are coming from IWANT based on our IHAVE's, then they know this is an error and must unchoke us. The issue, as you've pointed out, is that if the gossip (IHAVE/IWANT) system is going to be fast enough to beat the mesh in a statistically meaningful way to know if we should unchoke. The complications are that the speed of the gossip is very much parameter dependent. We could make the heartbeats very fast, like 10's of ms and I would imagine it would be fine, but I dont have a good feel if we can use second-level heartbeats to also achieve appropriate unchoking.

The speed, overhead, and usefulness of gossip is also very much dependent on message size. With very small it is almost useless, with large messages it saves a lot of bandwidth. Having said that, we also have metrics live on IHAVE/IWANT hit ratios, that can help. Happy to share those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all this.

I think the current plan is to build some generic functions that decide whether to choke/unchoke peers based on some message timing data. So we can play with some arbitrary distributions and decisions about choking/unchoking and see how they play out in real networks.

Once we have something concrete, I'll run some ideas past you. :)

| `fanout_addition_threshold` | How aggressively we add peers from the fanout into the mesh. The percentage of messages that we receive in the `choke_heartbeat_interval` that were received from a fanout peer. | 10 |

## Choking

Every `choke_heartbeat_interval` the router counts the number of valid (or not invalid) duplicate messages (note that the first message of its kind received is not a duplicate) and the time it took to receive each duplicate for each peer in each mesh that are not `CHOKE`'d. The router then filters which peers have sent duplicates over the `choke_duplicates_threshold` and sends `CHOKE` messages to at most `choke_churn` peers ordered by largest average latency. A router should ensure that at least `D_non_choke` peers remain in the mesh (and should not send `CHOKE` messages if this limit is to be violated) and should perform this check every heartbeat with the mesh maintenance.

## UnChoking

Every `choke_heartbeat_interval` the router counts the number of received valid messages obtained via `IWANT` (and hence gossip) from a `CHOKE`'d peer. If the percentage of received valid messages is greater then `unchoke_threshold` we send an `UNCHOKE` to randomly selected peers up to the `choke_churn` limit.

## Fanout Addition

Every `choke_heartbeat_interval` the router counts the number of received valid messages obtained via `IWANT` (and hence gossip) from `fanout` peers. If the percentage of received messages is greater than `fanout_addition_threshold` a random selection of these peers up to `D_max_add` are added to the mesh (provided the mesh bounds remain valid, i.e `D_high`).

## Handling Gossipsub Scoring For Choked Peers

TODO

## Protobuf Extension

The protobuf messages are identical to those specified in the [gossipsub v1.0.0 specification](https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.0.md) with the following control message modifications:

```protobuf
message RPC {
// ... see definition in the gossipsub specification
}

message ControlMessage {
repeated ControlIHave ihave = 1;
repeated ControlIWant iwant = 2;
repeated ControlGraft graft = 3;
repeated ControlPrune prune = 4;
repeated ControlChoke choke = 5;
repeated ControlUnChoke unchoke = 6;
}

message ControlChoke {
optional string topicID = 1;
}

message ControlUnChoke {
optional string topicID = 1;
}
```