perf: split lengthy fn(s) to improve concurrency #187

dan-da · 2024-09-23T18:57:33Z

addresses #183.
obsoletes #186.

Summary: This PR gives us a knob we can twist to favor either concurrency or read+write atomicity. The intent is to give us greater flexibility as we move forward with processing "real" data... fuller blocks, longer proof times, etc.

Illustration:

There are several methods with the following general behavior:

fn update_some_state(&mut self, ..) -> Result<()> {
    // perform one or more loops that gather data and update state.
}

Often the reading can be quite slow, such as fetching blocks from disk in a loop in order to examine the inputs and outputs.

This is bad for concurrency as the caller must be holding a write-lock over GlobalState in order to call an &mut self method. So all other tasks that read or write app-state are blocked.

So the above fn can be split as follows:

fn prepare_update_some_state(&self, ..) -> Result<Updates> {
    // perform one or more loops that gather data. (slowly)
}

fn finalize_update_some_state(&mut self, updates: Updates) -> Result<()> {
    // update state (as fast as possible)
}

Now the i/o intensive read ops are performed in an &self method that only requires the caller to hold a read-lock over GlobalState. This allows all other read-only tasks to progress. Only tasks that attempt to modify state will be blocked.

The finalize method performs the state mutation as before but now it can do so much more quickly because all the data gathering and calcs are already done.

This is a win for concurrency but it comes at a cost. If the caller acquires read-lock for read method and then write-lock for the write method, we have lost atomicity over the entire read+write. That may be too high of a price to pay, depending on the operation. Still the caller now has the option to do it either way. The caller can either:

favor atomic: acquire read+write lock, perform read, perform write
favor concurrent: acquire read lock, perform read, acquire write lock, perform write.

Note that for (2) the read is atomic and the write is atomic, but the read+write is not atomic.

Batching

Some of the read methods are potentially updating too much data to keep it all in RAM for the caller to pass it to the write method. In this case, batching is used and the pattern becomes:

fn prepare_update_some_state(&self, start_idx, batch_size, ..) -> Result<Updates> {
    // perform one or more loops that gather data. (slowly)
}

fn finalize_update_some_state(&mut self, updates: Updates) -> Result<()> {
    // update state (as fast as possible)
}

So the caller may then call the read and write methods in a loop. If concurrency is favored then the caller would acquire read-lock for each read and write-lock for each write. Note that in this case, the reads and writes are only atomic per batch.

Backwards compatibility

wrapper methods are provided for the caller that preserve the original &mut self behavior. eg:

fn update_some_state(&mut self, ..) -> Result<()> {
    let updates = prepare_update_some_state(..)?;
    finalize_update_some_state(updates)
}

Smaller Fns

There is also a small win in that this refactoring breaks some of these fn into smaller units such that the logic/purpose of each becomes clearer in isolation. The best example is update_mutator_set().

Methods changed

WalletState::update_wallet_state_with_new_block()
ArchivalState::update_mutator_set()
GlobalState::resync_membership_proofs()
GlobalState::set_new_tip()

of special note:

In this PR the calling code is (still) favoring read+write atomicity for all of the above except resync_membership_proofs(). I think that should be ok, but am also fine with making it favor atomicity for now (until such time that it becomes a perf problem).
update_mutator_set() is complex and potentially updates a lot of data. It was separated into several fn and batching is used.
set_new_tip() has two separate impls. GlobalState::set_new_tip_atomic() and GlobalStateLock::set_new_tip_concurrent(). The latter is available if we should ever need it but presently unused.

ScopeDurationLogger

This PR also adds a new type, ScopeDurationLogger that makes it a one-liner to enable logging of [slow] methods (or any scope). By default "slow" means > 0.1 seconds. If a given scope takes longer than the "slow" threshold, a warning will be emitted to the log. Otherwise nothing is logged. This makes it convenient to sprinkle the ScopeDurationLogger anywhere we think a fn might be slow eventually but it won't clutter up the log unless it actually is slow.

The slow logging threshold defaults to 0.1 seconds and can be changed via:

env var: LOG_SLOW_FN_THRESHOLD. (can also be set in Cargo.toml)
ScopeDurationLogger::new_with_threshold()

This replaces the older duration_* macros that had to be used by the caller.

ScopeDurationLogger could actually be made into a fn decoration macro for slightly cleaner usage. But I think that requires use syn to modify the fn and I didn't think it worth the time to learn/impl, especially as it is already a one-liner. so, maybe later.

Details:

log genesis block digest
rename GlobalState::set_new_tip() -> set_new_tip_atomic()
impl ScopeDurationLogger and use it to log method durations
split WalletState::update_wallet_state_with_new_block()
split ArchivalState::update_mutator_set()
add experimental GlobalStateLock::set_new_tip_concurrent() (not used)
split GlobalState::resync_membership_proofs()
remove duration_* macros
add fn_name, fn_name_bare macros

Primary changes are: 1. splits lengthy &mut self methods into read-only &self method(s) and faster &mut self method(s). This makes these methods more concurrency friendly. 2. provides original functionality with methods that have _atomic() suffix. These take &mut self and call both the &self and &mut self methods. 3. Adds ScopeDurationLogger to easily enable logging of slow methods. by default "slow" means > 0.1 seconds. Details: * log genesis block digest * rename GlobalState::set_new_tip() -> set_new_tip_atomic() * impl ScopeDurationLogger and use it to log method durations * split WalletState::update_wallet_state_with_new_block() * split ArchivalState::update_mutator_set() * add experimental GlobalStateLock::set_new_tip_concurrent() (not used) * split GlobalState::resync_membership_proofs() * remove duration_* macros * add fn_name, fn_name_bare macros

dan-da · 2024-11-03T20:46:43Z

This PR is pretty ambitious and is probably too hard to merge now.

I pulled ScopeDurationLogger out of it and committed it to master. see de96f98 for details. This can help us gain insight into how long various execution entry points are taking.

dan-da mentioned this pull request Sep 23, 2024

split resync membership proofs #186

Closed

Sword-Smith force-pushed the master branch from bb72b9d to cb012ff Compare November 4, 2024 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: split lengthy fn(s) to improve concurrency #187

perf: split lengthy fn(s) to improve concurrency #187

dan-da commented Sep 23, 2024 •

edited

Loading

dan-da commented Nov 3, 2024

perf: split lengthy fn(s) to improve concurrency #187

Are you sure you want to change the base?

perf: split lengthy fn(s) to improve concurrency #187

Conversation

dan-da commented Sep 23, 2024 • edited Loading

Illustration:

Batching

Backwards compatibility

Smaller Fns

Methods changed

ScopeDurationLogger

dan-da commented Nov 3, 2024

dan-da commented Sep 23, 2024 •

edited

Loading