Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: split lengthy fn(s) to improve concurrency #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dan-da
Copy link
Collaborator

@dan-da dan-da commented Sep 23, 2024

addresses #183.
obsoletes #186.

Summary: This PR gives us a knob we can twist to favor either concurrency or read+write atomicity. The intent is to give us greater flexibility as we move forward with processing "real" data... fuller blocks, longer proof times, etc.

Illustration:

There are several methods with the following general behavior:

fn update_some_state(&mut self, ..) -> Result<()> {
    // perform one or more loops that gather data and update state.
}

Often the reading can be quite slow, such as fetching blocks from disk in a loop in order to examine the inputs and outputs.

This is bad for concurrency as the caller must be holding a write-lock over GlobalState in order to call an &mut self method. So all other tasks that read or write app-state are blocked.

So the above fn can be split as follows:

fn prepare_update_some_state(&self, ..) -> Result<Updates> {
    // perform one or more loops that gather data. (slowly)
}

fn finalize_update_some_state(&mut self, updates: Updates) -> Result<()> {
    // update state (as fast as possible)
}

Now the i/o intensive read ops are performed in an &self method that only requires the caller to hold a read-lock over GlobalState. This allows all other read-only tasks to progress. Only tasks that attempt to modify state will be blocked.

The finalize method performs the state mutation as before but now it can do so much more quickly because all the data gathering and calcs are already done.

This is a win for concurrency but it comes at a cost. If the caller acquires read-lock for read method and then write-lock for the write method, we have lost atomicity over the entire read+write. That may be too high of a price to pay, depending on the operation. Still the caller now has the option to do it either way. The caller can either:

  1. favor atomic: acquire read+write lock, perform read, perform write
  2. favor concurrent: acquire read lock, perform read, acquire write lock, perform write.

Note that for (2) the read is atomic and the write is atomic, but the read+write is not atomic.

Batching

Some of the read methods are potentially updating too much data to keep it all in RAM for the caller to pass it to the write method. In this case, batching is used and the pattern becomes:

fn prepare_update_some_state(&self, start_idx, batch_size, ..) -> Result<Updates> {
    // perform one or more loops that gather data. (slowly)
}

fn finalize_update_some_state(&mut self, updates: Updates) -> Result<()> {
    // update state (as fast as possible)
}

So the caller may then call the read and write methods in a loop. If concurrency is favored then the caller would acquire read-lock for each read and write-lock for each write. Note that in this case, the reads and writes are only atomic per batch.

Backwards compatibility

wrapper methods are provided for the caller that preserve the original &mut self behavior. eg:

fn update_some_state(&mut self, ..) -> Result<()> {
    let updates = prepare_update_some_state(..)?;
    finalize_update_some_state(updates)
}

Smaller Fns

There is also a small win in that this refactoring breaks some of these fn into smaller units such that the logic/purpose of each becomes clearer in isolation. The best example is update_mutator_set().

Methods changed

  • WalletState::update_wallet_state_with_new_block()
  • ArchivalState::update_mutator_set()
  • GlobalState::resync_membership_proofs()
  • GlobalState::set_new_tip()

of special note:

  • In this PR the calling code is (still) favoring read+write atomicity for all of the above except resync_membership_proofs(). I think that should be ok, but am also fine with making it favor atomicity for now (until such time that it becomes a perf problem).
  • update_mutator_set() is complex and potentially updates a lot of data. It was separated into several fn and batching is used.
  • set_new_tip() has two separate impls. GlobalState::set_new_tip_atomic() and GlobalStateLock::set_new_tip_concurrent(). The latter is available if we should ever need it but presently unused.

ScopeDurationLogger

This PR also adds a new type, ScopeDurationLogger that makes it a one-liner to enable logging of [slow] methods (or any scope). By default "slow" means > 0.1 seconds. If a given scope takes longer than the "slow" threshold, a warning will be emitted to the log. Otherwise nothing is logged. This makes it convenient to sprinkle the ScopeDurationLogger anywhere we think a fn might be slow eventually but it won't clutter up the log unless it actually is slow.

The slow logging threshold defaults to 0.1 seconds and can be changed via:

  • env var: LOG_SLOW_FN_THRESHOLD. (can also be set in Cargo.toml)
  • ScopeDurationLogger::new_with_threshold()

This replaces the older duration_* macros that had to be used by the caller.

ScopeDurationLogger could actually be made into a fn decoration macro for slightly cleaner usage. But I think that requires use syn to modify the fn and I didn't think it worth the time to learn/impl, especially as it is already a one-liner. so, maybe later.

Details:

  • log genesis block digest
  • rename GlobalState::set_new_tip() -> set_new_tip_atomic()
  • impl ScopeDurationLogger and use it to log method durations
  • split WalletState::update_wallet_state_with_new_block()
  • split ArchivalState::update_mutator_set()
  • add experimental GlobalStateLock::set_new_tip_concurrent() (not used)
  • split GlobalState::resync_membership_proofs()
  • remove duration_* macros
  • add fn_name, fn_name_bare macros

Primary changes are:
 1. splits lengthy &mut self methods into read-only &self
    method(s) and faster &mut self method(s).  This makes
    these methods more concurrency friendly.

 2. provides original functionality with methods that have
    _atomic() suffix.  These take &mut self and call both
    the &self and &mut self methods.

 3. Adds ScopeDurationLogger to easily enable logging of
    slow methods.  by default "slow" means > 0.1 seconds.

Details:
 * log genesis block digest
 * rename GlobalState::set_new_tip() -> set_new_tip_atomic()
 * impl ScopeDurationLogger and use it to log method durations
 * split WalletState::update_wallet_state_with_new_block()
 * split ArchivalState::update_mutator_set()
 * add experimental GlobalStateLock::set_new_tip_concurrent()
    (not used)
 * split GlobalState::resync_membership_proofs()
 * remove duration_* macros
 * add fn_name, fn_name_bare macros
@dan-da
Copy link
Collaborator Author

dan-da commented Nov 3, 2024

This PR is pretty ambitious and is probably too hard to merge now.

I pulled ScopeDurationLogger out of it and committed it to master. see de96f98 for details. This can help us gain insight into how long various execution entry points are taking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant