Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#334) Refactor / modularize shared code in splinter_test perf tests #385

Merged
merged 1 commit into from
Mar 23, 2022

Conversation

gapisback
Copy link
Collaborator

@gapisback gapisback commented Mar 21, 2022

Extract out shared / repeated code between --perf and --parallel-perf test cases of splinter_test to sub-functions. Introduce the following:

  • compute_per_table_inserts()
  • load_thread_params()
  • do_n_thread_creates()
  • do_n_async_ctxt_inits(), do_n_async_ctxt_deinits()
  • splinter_perf_inserts() - which wraps up few of above functions to
    drive the inserts-phase of the workload
  • splinter_perf_lookups() - drives lookups phase of the workload
  • splinter_perf_range_lookups() - drives range lookups phase

Rework chunks of test_splinter_perf() and test_splinter_parallel_perf()
to invoke these helper / refactored functions.

This commit lays down the groundwork for future refactoring of other test cases in this test, to improve code-sharing.


NOTE: This splinter_test --perf is included in CI-runs, but without the --num-range-lookup-threads arg.
With range-lookups, the overall performance becomes very slow, so this test has been commented out of nightly run tests set.

This refactoring is one in a small line of 4-some issues I have opened to investigate and improve the overall execution of --perf tests, with range lookup threads included.

I have opened issue #355 to come-in immediately after this, to re-use the modularized functions, and refactor the bodies of the existing functions for other periodic & delete perf test cases.


Testing status: Just to re-confirm, I re-ran this test case on this machine: sdb-oss-test-vm, with 32 cores and 64 GiB RAM.

sdb-oss-test-vm:[20] $ /usr/bin/time ./bin/driver_test splinter_test --perf --max-async-inflight 0 --num-insert-threads 8 --num-lookup-threads 8 --num-range-lookup-threads 8 --lookup-positive-percent 10 --tree-size-mib 1024 --db-capacity-gib 60 --db-location splinter_test.perf.db

Test ran cleanly:

splinter_perf_range_lookups() completed.
After destroy:
58999.74user 1533.20system 2:06:11elapsed 799%CPU (0avgtext+0avgdata 2076256maxresident)k
0inputs+3868440outputs (45major+12474780minor)pagefaults 0swaps

Re-run of --parallel-perf test on the same machine: on-going ...

sdb-oss-test-vm:[22] $ /usr/bin/time ./bin/driver_test splinter_test --parallel-perf --max-async-inflight 0 --num-pthreads 12 --lookup-positive-percent 10 --tree-size-gib 8 --db-capacity-gib 60 --db-location splinter_test.pll_perf.db

uint64 range_min;
uint64 range_max;

} trunk_range_perf_params;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct introduced so we can call range-lookups perf test sub-case with diff min / max / num parameters.

@@ -92,6 +102,9 @@ test_all_done(const uint8 done, const uint8 num_tables)
return (done == ((1 << num_tables) - 1));
}

/*
* test_trunk_insert_thread() -- Per-thread function to drive inserts.
*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much code is basically modularized in-place to sub-functions, for code reuse.

spl_idx);
platform_default_log(
PLATFORM_CR
"Thread %lu inserting %3lu%% complete for table %u ... ",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Report thread-ID also, so you will see which thread is actively running ... nice updates in the UI when this test is run manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally multiple threads are running concurrently. Isn't this a giant mess then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in the PR meeting this afternoon. When this test is run manually, the message gets continually over-written with a new thread-ID. So, ... not, it's not a mess. You see progress %age changing for diff threads, actually quite nicely.

continue;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conform to coding standard ...

continue;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QS: @ajhconway , @rtjohnso - For cleanup like these ... what's our protocol? I think it's better to log a separate issue for such cleanup, so there is a record of this ask.

Then, do you wish to see a separate review cycle? Or, can I move such clean-up items as independent PRs on their own issue, w/o a review?

I want to be careful about forging ahead aggressively with "no review" PR-merges, as that may lead to lots of unsupervised check-ins, creating chaos. (Let's talk off-line, too ...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm I've two minds here. Clearly it makes review easier if these cleanups are separate PRs, but it's also convenient to just take care of this stuff as we encounter it. Let's talk offline.

* test_trunk_insert_lookup_thread
*
* do number of insert_granularity inserts followed by number of
RESOLVE: Comments need rewording: ... Suggestions?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindly advise .... I tried to glean what the code author meant with comment below ... but failed.

Need suggestions for rewording ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not really following what's going on here. Clearly this thread is doing some mixture of operations, but the exact details aren't immediately obvious to me either from the comment or the code itself.

I think that what's happening is that there is a global op_granularity of say 100. Then the thread will do say 30 inserts, 50 lookups and 20 range lookups in each 100 operation block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded the comment to be clearer, .. somewhat.

compute_per_table_inserts(uint64 *per_table_inserts, // OUT
trunk_config *cfg, // IN
test_config *test_cfg, // IN
uint8 num_tables)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here on, you will see chunks of code modularized out into sub-functions. No logic has been done, I assure you.

}
if (num_range_threads > num_threads) {
num_threads = num_range_threads;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deleted chunk can be seen in test_splinter_perf() in this chunk as:

1242    uint64 num_threads = MAX(num_insert_threads, num_lookup_threads);
1243    num_threads        = MAX(num_threads, num_range_threads);

Nothing is missing ...

* Helper function to load thread-specific parameters to drive the workload
*/
static void
load_thread_params(test_splinter_thread_params *params,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted out common code to this function, and left test sub-case-specific param init fn inline, as, e.g. in these lines below, in test_splinter_parallel_perf():

1870    // Load thread-specific execution parameters for the parallelism between
1871    // inserts & lookups in the workload
1872    for (uint64 i = 0; i < num_threads; i++) {
1873       params[i].spl                           = spl_tables;
[...]


for (uint64 i = 0; i < num_insert_threads; i++) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed into the fn below ...


uint64 num_async_lookups = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted block is seen below on L1062 onwards. Used MAX() to simplify if-else construction to find Max.

* -----------------------------------------------------------------------------
*/
static platform_status
test_splinter_perf(trunk_config *cfg,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New body of refactored test-case, which shrinks quite a lot. Improves overall code readability.

for (uint64 i = 0; i < num_range_threads; i++) {
if (!SUCCESS(params[i].rc)) {
rc = params[i].rc;
for (int rctr = 0; rctr < ARRAY_SIZE(perf_ranges); rctr++) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One immediate benefit from this refactoring ... We can now drive multiple test cases for range-lookups with different num / min / max values, defined by the trunk_range_perf_params perf_ranges[] array.

Copy link
Contributor

@ajhconway ajhconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overall a very good change imo. No major comments.

src/util.h Outdated
@@ -281,6 +281,9 @@ try_string_to_int8(const char *nptr, // IN
/* Disk-resident structures that should be packed are tagged with this. */
#define ONDISK __attribute__((__packed__))

/* Mnemonic to indicate that a parameter is unused / not-needed */
#define PARAM_UNUSED 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists in the platform as UNUSED_PARAM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm ... looking closer, that UNUSED_PARAM is a compiler directive, __attribute__((__unused__)).

What I was adding was a mnemonic for an arg's value as 0, to indicate it's not used by the callee.

In any case, as both names are too close & similar, I withdrew mine. Locally fixed it in the splinter_test.c, to use hard-coded value of 0, with a comment.

spl_idx);
platform_default_log(
PLATFORM_CR
"Thread %lu inserting %3lu%% complete for table %u ... ",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally multiple threads are running concurrently. Isn't this a giant mess then?

continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm I've two minds here. Clearly it makes review easier if these cleanups are separate PRs, but it's also convenient to just take care of this stuff as we encounter it. Let's talk offline.

* test_trunk_insert_lookup_thread
*
* do number of insert_granularity inserts followed by number of
RESOLVE: Comments need rewording: ... Suggestions?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not really following what's going on here. Clearly this thread is doing some mixture of operations, but the exact details aren't immediately obvious to me either from the comment or the code itself.

I think that what's happening is that there is a global op_granularity of say 100. Then the thread will do say 30 inserts, 50 lookups and 20 range lookups in each 100 operation block.

test_splinter_thread_params *params,
task_system *ts,
platform_heap_id hid,
void (*thread_hdlr)(void *arg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets typedef this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 1122 to 1123
"range_min=%lu"
", range_max=%lu ...\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine these lines

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Extract out shared / repeated code between --perf and --parallel-perf
test cases of splinter_test to sub-functions. Introduce the following:
 - compute_per_table_inserts()
 - load_thread_params()
 - do_n_thread_creates()
 - do_n_async_ctxt_inits(), do_n_async_ctxt_deinits()
 - splinter_perf_inserts() - which wraps up few of above functions to
   drive the inserts-phase of the workload
 - splinter_perf_lookups() - drives lookups phase of the workload
 - splinter_perf_range_lookups() - drives range lookups phase

Rework chunks of test_splinter_perf() and test_splinter_parallel_perf()
to invoke these helper / refactored functions.

This commit lays down the groundwork for future refactoring of other
test cases in this test, to improve code-sharing.
@gapisback gapisback force-pushed the agurajada/334-Refactor-perf-tests branch from cac6aa9 to fd6617e Compare March 23, 2022 19:35
@gapisback gapisback enabled auto-merge (rebase) March 23, 2022 19:48
@gapisback gapisback merged commit 3b7bc58 into main Mar 23, 2022
@gapisback gapisback deleted the agurajada/334-Refactor-perf-tests branch March 23, 2022 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants