plugin: add estimation of cores-per-node count on system during initialization #469

cmoussa1 · 2024-07-16T17:17:11Z

Problem

The priority plugin does not know about basic system information it will need in order to enforce a max-cores limit per association, such as the number of cores on a node.

This PR adds an estimation of a cores-per-node count estimate during the initialization of the priority plugin by fetching resource.R from the KVS. It stores this estimate in a global variable in the plugin. The plan is to use this count when calculating the number of cores used by a job when only nnodes are specified. I think that this count might not be exactly right for systems where the core count per-node might be different throughout the system, but I figure this could at least be a good estimate and a start for tracking and enforcing resource limit per-association across all of their running jobs (see conversation in flux-framework/flux-core#6091).

This count is also added to the list of information returned in the callback for plugin.query.

grondo

Some early comments on a first pass-through

grondo · 2024-07-17T13:45:31Z

src/plugins/mf_priority.cpp

+    if (flux_kvs_lookup_get_unpack (f,
+                                    "{s{s[{s{s:s}}]}}",
+                                    "execution",
+                                      "R_lite",
+                                        "children",
+                                          "core", &core) < 0) {
+        flux_log_error (h, "flux_kvs_lookup_unpack");
+        return -1;
+    }


Normally I would request that this synchronous get be replaced with an asynchronous flux_future_then(3) and the parsing of R handled in a callback. However, it seems like the result is required for validation of jobs, and you probably don't want to let some jobs through erroneously while waiting for resource.R, so perhaps this is actually the right solution. The job manager will block here while waiting for R when mf_priority.so is loaded, but it should be a very short time, and while this is occurring job management will also pause. Most of the time this will occur during job manager module load, which has other synchronous work anyway.

I'd at least suggest a comment here describing why a synchronous get is used in this case.

@garlick: any other thoughts?

The FLUX_KVS_WATCH flag should be dropped. That's only used when you want to receive a response for every change to the key, and here the future is being destroyed after the first response.

IRL, we only load mf_priority.so in the system instance so resource.R should exist already and the KVS lookup should be fast. FLUX_KVS_WAITCREATE is probably is needed in test where the plugin is loaded in a test instance and resource.R is dynamically discovered though.

Yes a comment would be good since synchronous activities always raise eyebrows.

grondo · 2024-07-17T13:49:56Z

src/plugins/mf_priority.cpp

+                                    "{s{s[{s{s:s}}]}}",
+                                    "execution",
+                                      "R_lite",
+                                        "children",
+                                          "core", &core) < 0) {


R may have multiple entries in the R_lite array. To handle possible heterogeneity, you could iterate each entry and use the maximum number of cores found.

Probably ok if this is just a first cut, though. If so, I'd put a comment stating that "equal number of cores on all nodes in R is assumed, so we only look at the first entry" or simimlar.

Thanks for mentioning this, I was unaware that it could have multiple entries. If it has multiple entries, would it look like this?

{ "version": 1, "execution": { "R_lite": [ { "rank": "19-22", "children": { "core": "0-47", } }, { "rank": "23-29", "children": { "core": "0-15", } } ] } }

grondo · 2024-07-17T13:52:21Z

src/plugins/mf_priority.cpp

+    // calculate number of cores-per-node on system
+    ncores_per_node = calculate_range (core);


Instead of parsing the idset with custom code here, you can use libflux-idset.so exported by flux-core.
See idset_decode(3), then just use idset_count(3) to get the number of cores.

grondo · 2024-07-17T13:56:13Z

t/t1036-mf-priority-resource-counting.t

+	flux jobtap query mf_priority.so > query.json &&
+	test_debug "jq -S . <query.json" &&
+	cat query.json &&
+	jq -e ".ncores_per_node == 2" <query.json


Will this check work on all systems? If you need to check the actual cores/node you could use nproc or possibly flux resource R --include=0 | flux R decode --count=core (sorry if I missed someplace where you've guaranteed 2 cores per node)

codecov · 2024-07-18T18:45:53Z

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.08%. Comparing base (a6bd897) to head (5747337).
Report is 6 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #469      +/-   ##
==========================================
- Coverage   83.30%   83.08%   -0.22%     
==========================================
  Files          23       23              
  Lines        1557     1573      +16     
==========================================
+ Hits         1297     1307      +10     
- Misses        260      266       +6

Files	Coverage Δ
src/plugins/mf_priority.cpp	`82.19% <64.70%> (-0.87%)`	⬇️

cmoussa1 · 2024-07-18T18:55:55Z

Thank you both for the first pass! I've force-pushed up some changes to drop FLUX_KVS_WATCH, use libflux-idset, and change how the test is structured to instead use flux resource R --include=0 | flux R decode --count=core (this check makes a lot more sense than what I had).

I've also added some comments in code where suggested.

Problem: The paths for FLUX_IDSET_LIBS/FLUX_IDSET_CFLAGS are not listed in the output of ./configure. There is also slight misformatting in the output of the FLUX_CORE_CFLAGS label. Switch the period and colon at the end of the FLUX_CORE_CFLAGS line. Add FLUX_IDSET_LIBS and FLUX_IDSET_CFLAGS to the output of ./configure.

Problem: The priority plugin does not know about basic system information it will need in order to enforce a max-cores limit per association, such as the number of cores on a node. Add an estimation of a cores-per-node count estimate during the initialization of the priority plugin by fetching resource.R from the KVS. Store this estimate in a global variable in the plugin. Add this estimate to the list of information returned in the plugin.query callback.

Problem: flux-accounting has no tests for estimating the cores-per-node count by fetching resource.R from the KVS and then querying it from the plugin. Add some basic tests.

cmoussa1 added new feature new feature plugin related to the multi-factor priority plugin labels Jul 16, 2024

cmoussa1 mentioned this pull request Jul 16, 2024

is there a way to count/estimate the number of cores-per-node on a system instance? flux-framework/flux-core#6091

Open

grondo reviewed Jul 17, 2024

View reviewed changes

cmoussa1 force-pushed the resource.counting.plugin branch 2 times, most recently from 0c4af54 to 5747337 Compare July 18, 2024 18:33

cmoussa1 added 3 commits September 18, 2024 09:52

t: add test for querying cores-per-node info

409649f

Problem: flux-accounting has no tests for estimating the cores-per-node count by fetching resource.R from the KVS and then querying it from the plugin. Add some basic tests.

cmoussa1 force-pushed the resource.counting.plugin branch from 5747337 to 409649f Compare September 18, 2024 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin: add estimation of cores-per-node count on system during initialization #469

plugin: add estimation of cores-per-node count on system during initialization #469

cmoussa1 commented Jul 16, 2024

grondo left a comment

grondo Jul 17, 2024

garlick Jul 17, 2024

grondo Jul 17, 2024

cmoussa1 Jul 17, 2024

grondo Jul 17, 2024

grondo Jul 17, 2024

codecov bot commented Jul 18, 2024

cmoussa1 commented Jul 18, 2024

		// calculate number of cores-per-node on system
		ncores_per_node = calculate_range (core);

plugin: add estimation of cores-per-node count on system during initialization #469

Are you sure you want to change the base?

plugin: add estimation of cores-per-node count on system during initialization #469

Conversation

cmoussa1 commented Jul 16, 2024

Problem

grondo left a comment

Choose a reason for hiding this comment

grondo Jul 17, 2024

Choose a reason for hiding this comment

garlick Jul 17, 2024

Choose a reason for hiding this comment

grondo Jul 17, 2024

Choose a reason for hiding this comment

cmoussa1 Jul 17, 2024

Choose a reason for hiding this comment

grondo Jul 17, 2024

Choose a reason for hiding this comment

grondo Jul 17, 2024

Choose a reason for hiding this comment

codecov bot commented Jul 18, 2024

Codecov Report

cmoussa1 commented Jul 18, 2024