Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark storage arg to read fraction of kv pairs. #1233

Conversation

adelarja
Copy link
Contributor

The purpose of this pull request is to introduce an argument to the benchmarking storage CLI, enabling the selection of the percentage of database keys for benchmark execution.

The argument is named db-fraction and accepts values between 1 and 100 (1% to 100%).

Example usage (using 10% of keys):

./target/release/substrate benchmark storage --dev --state-version=0 --db=paritydb --db-fraction 10

Closes #146.

@cla-bot-2021
Copy link

cla-bot-2021 bot commented Aug 29, 2023

User @adelarja, please sign the CLA here.

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/3463878

@ggwpez ggwpez self-requested a review September 3, 2023 12:51
substrate/utils/frame/benchmarking-cli/src/storage/read.rs Outdated Show resolved Hide resolved

info!("Reading {} keys of {} keys", number_of_keys, keys.len());
let keys = &keys[0..number_of_keys];
let mut child_nodes = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if we should also apply this to the child nodes. Otherwise there could be cases where one large top key pulls in a huge child tree.

WDYT @crystalin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should also do it for the child_nodes below.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Capping the vec size to db_fraction can at most double the size of potential key reads (root + child) but I presume that's still a major improvement from what it is right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbaztec @ggwpez I applied it to the child nodes here and here. WDYT?

@@ -44,12 +44,15 @@ impl StorageCmd {
let mut keys: Vec<_> = client.storage_keys(best_hash, None, None)?.collect();
let (mut rng, _) = new_rng(None);
keys.shuffle(&mut rng);
let number_of_keys = (keys.len() * self.params.db_fraction as usize) / 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably error if the result is zero.

Copy link
Contributor Author

@adelarja adelarja Sep 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like:

let number_of_keys = if number_of_keys != 0 {number_of_keys} else {keys.len()};

If we have number_of_keys = 0, we use all the keys by default. Another option would be to set the minimum limit to 1. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all keys could also be empty afaik the check makes sense either way. Just returning an error is no problem.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number_of_keys is 0 that can mean either (or both) keys.len() or self.params.db_fraction was 0 , but self.params.db_fraction is restricted via clap arguments 1..=100 so it cannot be 0, in which case keys.iter().take(0) is equivalent to keys.iter() thereby keeping the old behavior in tact. I believe there's no special case to be handled here unless we're worried someone/something overriding db_fraction to 0 in which case that should be the thing being checked here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i am saying is that the old behaviour was probably also wrong.

Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay approving now. You can still add the !=0 check if you want, but its an edge-case anyway so i dont want to block it for that.

@crystalin
Copy link

I'd like to test it before getting it merged. One change that I could think of is to have the fraction be a per 1000th instead of percent. I'll verify it today

@ggwpez
Copy link
Member

ggwpez commented Sep 6, 2023

I'd like to test it before getting it merged. One change that I could think of is to have the fraction be a per 1000th instead of percent. I'll verify it today

Yes could be good for very large states. We can use a float percent or unit [0, 1].

@crystalin
Copy link

crystalin commented Sep 7, 2023

I tried on Moonbeam:

  • re-indexing took 20h (not sure why but also out of scope here)
  • storage benchmark, never finished (already 15h into it)

I'm not sure what is limiting it but it doesn't seem to use much CPU (2%). IOPS is at 1500 (6MB/s disk read) but nothing appears in the logs

@ggwpez
Copy link
Member

ggwpez commented Sep 12, 2023

Okay so its not working as intended. Probably because we first collect all keys with client.storage_keys(best_hash, None, None)?.collect().
I dont think its possible to request a random key from the DB @cheme, or? What we are trying to do is to run the benchmark only on x% of the KV pairs.

@bkchr
Copy link
Member

bkchr commented Jan 22, 2024

Do we still want this pr @ggwpez?

@ggwpez
Copy link
Member

ggwpez commented Jan 22, 2024

It did not actually solve the issue, since the slow time comes from iterating through all keys to pick a random sub-set.
The DB backend does not allow to pick a random key, so its only marginally faster with this fix.

@ggwpez ggwpez closed this Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

branchmark storage: Arg to read a fraction of all KV pairs
6 participants