Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication #1129

Merged
merged 14 commits into from
Nov 8, 2023

Conversation

staheri14
Copy link
Contributor

@staheri14 staheri14 commented Oct 28, 2023

Closes #1080 by capturing the second most bandwidth intensive communication protocol i.e., DataChannel.
Initially, the State channel was not included in the scope of this PR. However, because the DataChannel protocol heavily relies on the exchange of peer state, which occurs over the StateChannel, the document has been expanded to address the pertinent aspects of the state channel.

Please also note that this document will undergo additional updates in future rounds of editing as more details and insights are gathered about message flow, particularly concerning the updates transmitted over the StateChannel, which is intricately linked with the inner workings of the consensus state machine.

@staheri14 staheri14 self-assigned this Oct 28, 2023
@staheri14 staheri14 changed the title WIP: adds the consensus reactor spec doc: specifies DataChannel protocol in consensus reactor and relevant parts of StateChannel communication Nov 1, 2023
@staheri14 staheri14 marked this pull request as ready for review November 1, 2023 20:46
@@ -0,0 +1,261 @@
# Consensus Reactor

Consensus reactor handles message propagation for 4 different channels, namely, `StateChannel`, `DataChannel`, `VoteChannel`, and `VoteSetBitsChannel`.
Copy link
Contributor Author

@staheri14 staheri14 Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note to the Reviewers] This document contains additional information and question presented as inline comments. If you possess insights regarding the questions, kindly incorporate them into your review. Regarding the other comments containing side information, I am currently examining them. Once confirmed, I intend to integrate them into the main text. However, if you have any relevant input on these matters, please share your thoughts in the comments. Thansk!

rootulp
rootulp previously approved these changes Nov 2, 2023
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but my eyes glazed over half way through the doc. I can do a re-review later if needed.

@@ -0,0 +1,261 @@
# Consensus Reactor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] does the consensus reactor in celestia-core have any modifications from upstream? Is this doc something we should consider upstreaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the consensus reactor in celestia-core have any modifications from upstream?

Nothing that I am currently aware of (@evan would know better).
However, I can confirm that the version in cometbft has undergone modifications that are not yet integrated into celestia-core. For example, they have enhanced the block part propagation to increase efficiency, a feature not yet available in celestia-core. Despite this, upstreaming still seems viable, as I couldn't find any specific documentation regarding the consensus reactor in the CometBFT repository.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Definitely not blocking on upstream it.

spec/reactors/consensus.md Outdated Show resolved Hide resolved
spec/reactors/consensus.md Outdated Show resolved Hide resolved
spec/reactors/consensus.md Outdated Show resolved Hide resolved
@@ -133,7 +133,7 @@ func (psh *PartSetHeader) ToProto() cmtproto.PartSetHeader {
}
}

// FromProto sets a protobuf PartSetHeader to the given pointer
// PartSetHeaderFromProto sets a protobuf PartSetHeader to the given pointer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I'm totally on-board with these comment fixes but it may make sense to make them upstream to minimize celestia-core's diff with CometBFT.

Otherwise, they may accidentally get overwritten next time we pull from upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea, will do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR to upstream the go documentation fixes.

spec/reactors/consensus.md Outdated Show resolved Hide resolved
evan-forbes
evan-forbes previously approved these changes Nov 2, 2023
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great write up! I think we use this as a base set of expectations for our future improvements.

The current system can't do this, but in the future hopefully we can update the ideal case to be that a peer never sends another peers a block part that it already has (or even one will be getting from another peer!).

left a comment or two for discussion, but imo we can merge

Peers keep exchanging block parts of a proposal until either 1) all block parts of the intended proposal have been successfully transmitted between the two peers or 2) one of the peer's round state updates (and points to a new height and round with a different proposal).
In the latter case, the peer whose state has advanced still sends block parts to the other peer until all the parts are transmitted or until the receiving peer's round state is also updated.

Worst Case Scenario: The worst case occurs when both peers coincidentally choose the same block part index at the same moment and initiate the sending process concurrently, a scenario that's rather unlikely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is actually what we are seeing in the testnet experiments, but we often see that some heights the block parts are not distributed randomly, which I think results in a worst case scenario more often than what we would expect.
normalized_receipt_of_block_parts_with_completion_times_per_height

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to expand on this slightly, an example height that is less random and more in order are 30 or 31. Where heights such as 33 or 20 are randomly distributed relative to the peer that was tracing.

Copy link
Contributor Author

@staheri14 staheri14 Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing this! I can see that the higher indices are mostly chosen close to the end of block completion. This might not only be due to a non-random selection process but also could be influenced by the sequence in which parts of the block proposal are released to the network. I guess based on this figure, the initial indices are released first and that could be why we see the first indices are circulated earlier.
Double checked this part (the part that is crossed over) and the initial distribution of proposal block parts should be random as well. However, for the heights you mentioned it does not seem to be true (even for other heights, higher indices are received much later).

If we were to verify whether this figure indicates the worst case scenraio (whether peers exchange the same block parts at the same time), we would need to examine the index of the individual block parts communicated per p2p connections and see if more than one instance of each block part shows up or not. (I am going to extract this, or see if we have enough metrics to verify this)

spec/reactors/consensus.md Outdated Show resolved Hide resolved
evan-forbes
evan-forbes previously approved these changes Nov 5, 2023
rootulp
rootulp previously approved these changes Nov 6, 2023
spec/reactors/consensus.md Outdated Show resolved Hide resolved
spec/reactors/consensus.md Outdated Show resolved Hide resolved
spec/reactors/consensus.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

@staheri14
Copy link
Contributor Author

After examining the influxDB data points concerning block part exchanges for a node connected to mocha-4, I have compiled findings from a 1-hour period covering approximately 300 blocks.

downloads
uploads

Observations: The provided plots indicate that the behavior for Downloaded block parts aligns with expectations: each block part index appears only once per p2p connection for every block 'height' and 'round'. However, the pattern diverges when considering the Uploaded block parts. In this case, multiple instances of the same block part index have been uploaded during the same height and round over a single p2p connection, which is contrary to the expected behavior. I will be investigating it further and will share my findings.

@rootulp
Copy link
Collaborator

rootulp commented Nov 8, 2023

Nice plots, thanks for sharing!

@staheri14 staheri14 merged commit 426edcf into main Nov 8, 2023
15 checks passed
@staheri14 staheri14 deleted the sanaz/consensus-reactor-specs branch November 8, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EPIC: Identifying Expected Network Traffic Rate in celestia-core
3 participants