-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] DA Bridge Node Not Utilising Full Storage/Network Capacity During Sync #4108
Comments
Start: 2025‐02‐12 15:34:00.969Z, height = 14570 18607 − 14570 = 4037 blocks Blocks per second = 4037 ÷ 3 498.597s ≈ 1.15 blocks/s |
Thank you for reporting this detailed performance issue. From your data, the Bridge Node (BN) appears to sync at approximately 1.15 blocks/s for 32MB blocks, which translates to around 36.8 MB/s of throughput. Considering the 10 Gbps network on the BN side (and 3.2 Gbps on the validator), this is roughly 9.2% utilization of the validator’s network capacity—far below what the hardware should support. Additionally, the node has 32 CPU cores and 128 GB RAM, indicating that system resources should not be the limiting factor. We need to run benchmarks of BN sync in controlled env and identify the bottleneck for such low bandwidth utilisation |
HypothesisOne possible explanation for the performance bottleneck is how worker parallelization is currently managed in the BN sync process. Unlike the DASer approach—where a fixed number of workers continuously process tasks—the BN sync uses an errorgroup that splits the work among workers and waits for all to finish. If one task (e.g., fetching or processing a single block) is slow, the entire group can stall, leaving multiple workers idle and underutilizing available CPU and network resources. Ensuring that work is distributed in a way that prevents a single slow task from blocking others could significantly improve overall sync throughput. TestNeed blocking profile benchmark. Potential solutionUse DASer for core sync |
Using DASer for core sync will be a very complex change. |
Another way to test the hypothesis is to run with 1-2 cocurrency limit and see if thoughput changes. If it doesnt, then we should be looking at TCP connection itself, like enabling MPTCP or BBR. (Or socket IPC if on the same proc) |
Description
During performance testing of the DA bridge node, we discovered that the node is significantly underutilizing available system resources during synchronization, particularly when syncing from scratch.
Network -
32MB-100k
Current Behavior
~800 Write/OPs
~62–63 Mb/s
Existing Capabilities
DA Bridge Node
Validator
DA Configuration used
config.toml
Investigation Points
Impact
This significantly affects node operators who need to:
Would be great to have the ability to
increase/fine tune
the DA node configuration parameters in such a way to match the hardware capacity for a faster synchronisation.The text was updated successfully, but these errors were encountered: