feat: add max-radius param with 5% default #1647

KolbyML · 2025-01-22T11:40:58Z

What was wrong?

For a larger description of the problem can be viewed here #1574

The TL:DR is currently we start at 100 percent radius when the node will end up at 1-3%. This means we download and purge content our node won't end up storing which wastes a lot of CPU cycles, but also takes our uTP transfer limit which could be used for content within our final 1-3% final radius.

This is a big issue as on the state network, currently we are trying to bridge latest state, but our nodes uTP limit is always full so often trin-execution offer's of state diffs are declined as all State nodes are at transfer capacity.

^ here is a picture I took awhile ago, at the bottom you can see we successfully offered to 0 peers and the content was declined 8 times. This is due to the State nodes being at uTP transfer capacity (or in other words hitting our rate limit)

Recently this issue has gotten worse, but even if it didn't this PR is required if we want to bridge latest state. Bridging state content is very expensive, because there is just so much state to gossip. If we want Portal and the State network to be successful we need a 10x performance improvement at least in Trin as currently Trin can only handle 5 Mbps up and down which is way to slow, we won't be able to take a load on the network with this.

So this PR is the first step in a road of changes which will hopefully greatly increase the performance in Trin. I don't think we have a choice we need to be performent if we want users.

How was it fixed?

Add a cli called max-radius, by default the max radius is 5 percent, assuming we have a network of over 20 nodes we should have max coverage. Since the nodes radius's should end up below 3% especially as the network storage requirements will keep growing over time. So I don't think we should have a default bigger then 5%

For testing on hive we will use --max-radius 100

I think this solution is the best one for the foreseeable future, we could do a more complex scheme but that wouldn't be worth it right now, instead we should focus on finding other bottlenecks in Trin as this PR is good enough and gets us most of want we want for the foreseeable future. A more complex scheme would give diminishing returns, so I wouldn't expect us to look into it until Trin is very optimized so it is a very low priority to further optimize this with a complex solution.

njgheorghita · 2025-01-22T17:02:44Z

bin/trin/src/lib.rs

@@ -146,7 +148,7 @@ pub async fn run_trin(
            &discovery,
            utp_socket.clone(),
            portalnet_config.clone(),
-            storage_config_factory.create(&Subnetwork::History)?,
+            storage_config_factory.create(&Subnetwork::History, trin_config.max_radius)?,


Is there any reason to leave this as Distance::MAX? Since it seems like the bottlenecking volume is only on the state network iiuc.

The bottlenecking is only more extreme on State. This issue occurs on History too, it is just the radius's on our fleet are settled now, if we are targeting low hardware requirements I think this setting should be used on the history network as well. I would argue there isn't a good reason to leave it at Distance::MAX, compared to 5%

Also since we run History and State at the same time, they affect each other in terms of performance, so we should do both as bad performance on the History network would degrade performance of the State network.

morph-dev

In general, I'm in favor of this direction.

Few concerns:

we currently have only ~25 state nodes. Making their radius to 5% would leave us with gaps.
- we can/should get more state nodes
- once live, big storage nodes should make this less of an issue (it would still be nice to have entire domain space covered without them)
Once this is merged, with the code as is, nothing would change regarding already running nodes, which is what we want to accomplish in the first place. If you look at IdIndexedV1Store::init, I will fall under this condition, which means that it will initialize radius to distance of the farthest content in the db. See my comment as well.
Even when 2 is fixed, we have a bit of not properly tested scenario where we prune a lot of data and trin startup time. This might not be a problem at all (it's one time thing), but I wanted to highlight it.

morph-dev · 2025-01-24T15:14:44Z

crates/storage/src/versioned/id_indexed_v1/store.rs

@@ -432,7 +433,7 @@ impl<TContentKey: OverlayContentKey> IdIndexedV1Store<TContentKey> {
                    self.radius = Distance::ZERO;
                } else {
                    error!(Db = %self.config.content_type, "Farthest not found!");
-                    self.radius = Distance::MAX;
+                    self.radius = self.config.max_radius;
                }
            }
            Some(farthest) => {


Should we self.radius to not be higher than max_radius in this case as well.

Sure I will do min()

Ok I added the min so it will not return higher then the max_radius now

KolbyML · 2025-01-25T05:29:46Z

In general, I'm in favor of this direction.

Few concerns:

we currently have only ~25 state nodes. Making their radius to 5% would leave us with gaps.

we can/should get more state nodes

once live, big storage nodes should make this less of an issue (it would still be nice to have entire domain space covered without them)

All 25 state nodes are expected to settle well below 1% radius, we can add more state nodes there wouldn't be any push back against this.

morph-dev

All 25 state nodes are expected to settle well below 1% radius, we can add more state nodes there wouldn't be any push back against this.

When do you think we should add more nodes?

In general, I'm ok with this being merged. But would like to see 1-2 more approvals from others.

I'm still concerned that we might have holes in the domain space. But luckily we have big_storage nodes and there are other clients as well, so none of the content should be lost.

Once this is merged, and before it's deployed, we have to make sure that ansible configuration for big_storage nodes is updated with this flag.

KolbyML · 2025-01-28T22:06:56Z

All 25 state nodes are expected to settle well below 1% radius, we can add more state nodes there wouldn't be any push back against this.

When do you think we should add more nodes?

We can double our fleet now if you want, I am not necessarily in a rush, but it couldn't hurt

njgheorghita

Looks good! Just a few nitpicks. And I'll echo what was already said, but feels pretty important

Once this is merged, and before it's deployed, we have to make sure that ansible configuration for big_storage nodes is updated with this flag.

njgheorghita · 2025-02-01T17:00:20Z

crates/utils/src/cli.rs

+        ));
+    }
+
+    // If the max radius is 100%, return the maximum distance as without it we will be 35 less than


Not sure I understand this comment?

I tried to clarify it a bit

njgheorghita · 2025-02-01T17:01:18Z

crates/utils/src/cli.rs

@@ -72,3 +73,30 @@ pub fn subnetwork_parser(subnetwork_string: &str) -> Result<Arc<Vec<Subnetwork>>

    Ok(Arc::new(subnetworks))
 }
+
+pub fn max_radius_parser(max_radius_str: &str) -> Result<Distance, String> {
+    let max_radius_percentage = match max_radius_str.parse::<U256>() {


let Ok(max_radius_percentage) = max_radius_str.parse::<U256> else { return Err(...) }

I want to print the error from parse

KolbyML · 2025-02-02T16:50:52Z

https://github.com/ethereum/cluster/pull/1144 the ansible script is updated (or will be on monday when this PR is merged)

KolbyML · 2025-02-02T16:52:39Z

I am going to merge this PR on monday after the cluster PR is merged

KolbyML requested review from carver, njgheorghita, ogenev and morph-dev January 22, 2025 11:40

KolbyML self-assigned this Jan 22, 2025

feat: add max-radius param with 5% default

83ceaed

KolbyML force-pushed the default-max-radius-5-percent branch from d3a94af to 83ceaed Compare January 22, 2025 11:42

njgheorghita reviewed Jan 22, 2025

View reviewed changes

morph-dev reviewed Jan 24, 2025

View reviewed changes

fix: address milos's concern

086e032

morph-dev approved these changes Jan 28, 2025

View reviewed changes

njgheorghita approved these changes Feb 1, 2025

View reviewed changes

fix: resolve PR concerns

d64dda0

KolbyML force-pushed the default-max-radius-5-percent branch from cbc2cf2 to d64dda0 Compare February 2, 2025 16:44

KolbyML merged commit 0043f6b into ethereum:master Feb 4, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add max-radius param with 5% default #1647

feat: add max-radius param with 5% default #1647

KolbyML commented Jan 22, 2025 •

edited

Loading

njgheorghita Jan 22, 2025

KolbyML Jan 22, 2025 •

edited

Loading

KolbyML Jan 22, 2025

morph-dev left a comment

morph-dev Jan 24, 2025

KolbyML Jan 25, 2025

KolbyML Jan 25, 2025

KolbyML commented Jan 25, 2025

morph-dev left a comment

KolbyML commented Jan 28, 2025

njgheorghita left a comment

njgheorghita Feb 1, 2025

KolbyML Feb 2, 2025

njgheorghita Feb 1, 2025

KolbyML Feb 2, 2025

KolbyML commented Feb 2, 2025 •

edited

Loading

KolbyML commented Feb 2, 2025

feat: add max-radius param with 5% default #1647

feat: add max-radius param with 5% default #1647

Conversation

KolbyML commented Jan 22, 2025 • edited Loading

What was wrong?

How was it fixed?

Choose a reason for hiding this comment

KolbyML Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morph-dev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KolbyML commented Jan 25, 2025

morph-dev left a comment

Choose a reason for hiding this comment

KolbyML commented Jan 28, 2025

njgheorghita left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KolbyML commented Feb 2, 2025 • edited Loading

KolbyML commented Feb 2, 2025

KolbyML commented Jan 22, 2025 •

edited

Loading

KolbyML Jan 22, 2025 •

edited

Loading

KolbyML commented Feb 2, 2025 •

edited

Loading