Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add code to programmatically set parameters #184

Merged
merged 23 commits into from
Jun 9, 2020

Conversation

rhaas80
Copy link
Collaborator

@rhaas80 rhaas80 commented May 20, 2020

The introduces a function:

/*****************
 * Programmatically change configuration options
 ****************/
/* config: is a string of the form of a line in a configurartion file
 * if it consists only of KEY=VALUE pairs then a value is set (including all
 * parent values)
 * if it ends in "KEY" then a copy of the value of KEY is returned, it must be
 * free()ed
 * if it ends in "KEY=" then KEY is marked as removed (value of NULL)
 *
 * the return value is the value of the gotten KEY or the last VALUE set
 */
const char* SCR_Config(const char* config);

that takes as argument a configuration string of the form found on scr.conf:

NAME1=value1 [NAME2=value2 ...]

which is interpreted the same way as an configuration file line, or a query string which lacks the final value assignment

NAME1

or

NAME1=value1 NAME2

Options must be set on all MPI ranks, no broadcast takes place.

Options set this way override all other options.

Both C and Fortran bindings are provided and the routines are tested in examples/test_config.c which runs as part of ctest.

Configuration options set in the code are stored in a file .scr/app.conf to be used by Perl scripts.

This addresses issue #40 .

Caveats:

  • SCR_Config must be called before SCR_init
  • SCR_Config will initialize scr's MPI communicator to read in .scr/app.conf when the first SCR_Config call is made
  • due to underlying inconsistencies in the get_param functions there are memory leaks when querying certain configuration options

rhaas80 and others added 5 commits May 20, 2020 12:38
scr_globals.h provides access to variables set from parameters during
scr_init and which should not be accessed by tools to make tracking
parameter dependencies more feasible
scr_globals.h provides access to variables set from parameters during
scr_init and which should not be accessed by tools to make tracking
parameter dependencies more feasible
this introduces another config hash of config options "app" that take
precendence over all other hashes.
@rhaas80 rhaas80 force-pushed the rhaas/scr_param_set branch from b8068ad to 0b6d3b2 Compare May 20, 2020 17:52
@rhaas80
Copy link
Collaborator Author

rhaas80 commented May 20, 2020

hmm. some rebase was required (git confused me when working with multiple remotes), so I did not properly rebase before. Now tests fail. Will update once tests pass once more.

@rhaas80
Copy link
Collaborator Author

rhaas80 commented May 20, 2020

Actually already fails for me in the develop branch:

test 2
      Start  2: serial_test_api_start

2: Test command: /usr/bin/mpirun "-np" "1" "./test_api"
2: Environment variables:
2:  SCR_CONF_FILE=/data/rhaas/ncsa/LLNL/scr/examples/test.conf
2:  SCR_JOB_ID=439
2: Test timeout computed to be: 1500
2: SCR v1.2.0: rank 0 on ekohaes8: Failed to record cluster name @ /data/rhaas/ncsa/LLNL/scr/src/scr.c:600
2: SCR v1.2.0 ABORT: rank 0 on ekohaes8: Failed to create store descriptor for control directory [/tmp] @ /data/rhaas/ncsa/LLNL/scr/src/scr_storedesc.c:409
2: --------------------------------------------------------------------------
2: MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
2: with errorcode -1.
2:
2: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
2: You may or may not see output from other processes, depending on
2: exactly when Open MPI kills them.
2: --------------------------------------------------------------------------
 1/12 Test  #2: serial_test_api_start ................***Failed    1.36 sec

the file test.conf refers to /dev/shm only.

@rhaas80 rhaas80 force-pushed the rhaas/scr_param_set branch from 0b6d3b2 to 6fd51ee Compare May 20, 2020 19:42
@rhaas80
Copy link
Collaborator Author

rhaas80 commented May 20, 2020

updating all components, re-running cmake and compiling form scratch seems to have fixed this.

@rhaas80 rhaas80 requested a review from adammoody May 20, 2020 19:43
@adammoody
Copy link
Contributor

adammoody commented May 20, 2020

Thanks @rhaas80 . This is great! I'll try to review this later.

One thing that comes to mind in reading your description. I think it'd be good to make the settings from environment variables or config files override settings from the Config API. This would allow a user to override a setting the application developer has hardcoded via the config API. Without that, one would have to recompile.

src/scr.c Outdated

scr_param_app_hash_write_file(app_config);

free(app_config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
free(app_config);
scr_free(&app_config);

Copy link
Collaborator Author

@rhaas80 rhaas80 May 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The variable (app_config) goes out of scope a line after this so I did not change (I nottd the same free vs. scr_free potential issue). Will change to make this more obvious since I am apparently not the only one taking note of this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way in GitHub to jump to the current version of those lines to show my fix?

src/scr_param.c Outdated
value = kvtree_elem_get_first_val(scr_app_hash, name);
if (value != NULL) {
/* evaluate environment variables */
if(*value == '$'){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there is a variable embedded anywhere in the value, like "STORE=/tmp/$USER"

Suggested change
if(*value == '$'){
if(strchr(value, '$')){

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I thought I had got all of those when re-doing things to support expandenv.

@rhaas80
Copy link
Collaborator Author

rhaas80 commented May 21, 2020

One thing that comes to mind in reading your description. I think it'd be good to make the settings from environment variables or config files override settings from the Config API. This would allow a user to override a setting the application developer has hardcoded via the config API. Without that, one would have to recompile.

Fine with me (I had thought of SCR_Config something the application code would call based on its own input deck and env variables while the ENV variables are set inherited from a templated / canned submit scritpt). Certainly keeping env variables at highest priority is sensible if they are controlled directly by the user. I have moved (in d96a69a) the order so that now it is:

  • env vars
  • SCR_Config
  • user config file
  • other stuff

Before final merge I would squash commits for the various bits (Perl code, scr_param changes) to have one commit per bit.

@adammoody
Copy link
Contributor

I was able to take a closer look last night. Things generally look fine to me. User will love this feature.

A few more things:

  • We need to bump the user config file ahead of SCR_Config settings, because there are some things a user can only set in the config file (no environment variable).
  • Can you guard the SCR_Config calls that are in test_api behind a new command line option. It would be nice to be able to flip that on and off while testing, so that we can run test_api with and without Config calls.

Style:

  • Can you add braces around all if/while/for blocks?
  • I saw one or two spots where spacing was off.
  • My compiler throws errors when declaring variables in for loops, e.g., for (int i = 0). I know this is ancient, but it's the default compiler on redhat, so not uncommon for users. We should either move those declarations out of the for loop or add -std=c99 in our build. For now, I've been moving those declarations just above the for loop since I wasn't sure about the build.

@rhaas80
Copy link
Collaborator Author

rhaas80 commented May 26, 2020

  • I added options -a, --config_api=<BOOL> and -c, --conf_file=<BOOL> to test_api so that one can selectively ignore / not ignore test.conf and the config api. The default is to use test.conf only.

  • I moved the priority of SCR_Config settings below user config files. Note that my choice of SCR_Config taking priority (over everything) was deliberate (and not just b/c it was added last) and mimics what eg X11 does with its resource files. There the order is, in increasing order of priority: resource files in /etc (app-defaults), user file in .Xdefaults, application setting calls. The application calls take highest priority since the application is expected to make these calls in response to command line options or other user interaction (https://www.oreilly.com/library/view/x-window-system/9780937175149/Chapter09.html#page_182). Exactly what one wants to do may depend on what the user config file from SCR_CONF_FILE is: is truely a "user" file that would like eg in $HOME (due to eg having to set the path to a burst buffer than cannot be set system-wide) or is it one file per simulation. Either way is obviously fine (in particular with me), as long as it is documented and the choice is made deliberately.

  • I took a look through the added code and believe to have made the spacing and use of braces consistent with existing code, ie 1 space after if/for/while, none after switch. Braces for all if/for/while, on the same line, with a space in front

  • I removed use of c99 features (variables declared in for loop headers, C++ style comments)

Right now these changes are individual commits but I will squash them down with the original commits once the pull request is reviewed.

There are some whitespace changes in code not touched due to SCR_Config, mostly in expandenv and its callers.

@adammoody
Copy link
Contributor

adammoody commented May 27, 2020

Thanks @rhaas80 . This all looks good to me.

I keep going back and forth on how to prioritize the SCR_Config settings, too. As I talk with more code teams I'm hoping that might shed more light on which way to go with this. We can talk about these options on our next call. I think we're generally on the same page as to letting the end user have final control.

The SCR system config file is meant to be something a system admin might create with settings that apply to the machine which is useful for all users. This could contain info about expected failure groups and available storage mount points that one can opt to use as cache locations. This might be installed to /etc/scr/scr.conf. In practice, I don't think admins will really be creating or maintaining these files. Even though these files aren't likely to be used, they're at least an option.

The SCR user config file is meant to be for the end user of the application. SCR will read the file pointed to by SCR_CONF_FILE if that is set. That might be a conf file the user has in their home directory in case they can use one config file across lots of their runs. If SCR_CONF_FILE is not set, SCR automatically looks in the SCR_PREFIX directory of the job for an existing ".scrconf" file. This lets users more easily manage a config file per application if they want to, where each simulation (prefix directory) has its own config file.

The application developers now have three options for their users:

  1. They can code in SCR_Config calls. Those calls might use static values that cannot be changed at runtime, or they might construct dynamic values based on parameters from the command line or an application input file.
  2. They can distribute a recommended .scrconf file with the application binary that users should use when running the application. Users might then tweak settings directly in that file.
  3. Some developers distribute run scripts with their apps, and those scripts could encode SCR environment variables to configure SCR. Users might tweak those settings, or they might set variables before executing the application run script.

My goal was to make things so users can use environment variables or config files to override what the developers might have coded up in SCR_Config calls. If the settings that the developer picked are wrong, this gives users an escape hatch to override those settings.

@adammoody
Copy link
Contributor

@rhaas80 , I think this looks good to merge. Is it good to go for you?

While working with apps, I'm already marking places in their code where they will be able to use this. This will be a popular feature.

@adammoody
Copy link
Contributor

It does remind me of our existing issue where our scripts can see different values from the application depending on where parameters get set. I know you've already worked through this and encoded it directly into the config params code. To get a grasp of where we stand, we should put together a matrix of SCR parameter vs user-facing scripts (scr_prerun, scr_run, library, scr_postrun). That way we can show what gets used and where. We could include that on our readthedocs page if we think users would benefit, but if not there, we should definitely put that somewhere where developers can see it.

@adammoody adammoody merged commit 8f032d9 into LLNL:develop Jun 9, 2020
@adammoody
Copy link
Contributor

Thanks , @rhaas80 .

@rhaas80
Copy link
Collaborator Author

rhaas80 commented Jun 10, 2020

oha. I am way too slow for you here, sorry. No cleanup of the branch anymore :-).

I don't have a full matrix but I have the list of parameters that are used by user-facing scripts that run before the main app runs: "CACHE", "CACHEDIR", "CNTLDIR", "SCR_CACHE_BASE", "SCR_CACHE_SIZE", "SCR_CNTL_BASE", "SCR_CONF_FILE", "SCR_DB_DEBUG", "SCR_DB_ENABLE", "SCR_DB_HOST", "SCR_DB_NAME", "SCR_DB_PASS", "SCR_DB_USER", "SCR_EXCLUDE_NODES", "SCR_JOB_NAME", "SCR_LOG_ENABLE", "SCR_PREFIX", "SCR_WATCHDOG_TIMEOUT", "SCR_WATCHDOG_TIMEOUT_PFS"

These are used by scripts / utililties (maybe not all user facing):

  • scr_list_down_nodes: SCR_EXCLUDE_NODES, CACHEDIR, CNTLDIR, SCR_PREFIX_DIR
  • scr_list_dir: CACHE, SCR_CNTL_BASE, SCR_PREFIX
  • scr_watchdog: SCR_WATCHDOG_TIMEOUT, SCR_WATCHDOG_TIMEOUT_PFS, SCR_PREFIX
  • scr_log_event: "CNTLDIR", SCR_CACHE_BASE, SCR_CACHE_SIZE, SCR_CNTL_BASE, SCR_CONF_FILE, SCR_DB_DEBUG, SCR_DB_ENABLE, SCR_DB_HOST, SCR_DB_NAME, SCR_DB_PASS, SCR_DB_USER, SCR_EXCLUDE_NODES, SCR_JOB_NAME, SCR_LOG_ENABLE, SCR_PREFIX, SCR_WATCHDOG_TIMEOUT, SCR_WATCHDOG_TIMEOUT_PFS
  • scr_log_transfer: "CNTLDIR", SCR_CACHE_BASE, SCR_CACHE_SIZE, SCR_CNTL_BASE, SCR_CONF_FILE, SCR_DB_DEBUG, SCR_DB_ENABLE, SCR_DB_HOST, SCR_DB_NAME, SCR_DB_PASS, SCR_DB_USER, SCR_EXCLUDE_NODES, SCR_JOB_NAME, SCR_LOG_ENABLE, SCR_PREFIX, SCR_WATCHDOG_TIMEOUT, SCR_WATCHDOG_TIMEOUT_PFS

@rhaas80 rhaas80 deleted the rhaas/scr_param_set branch November 4, 2020 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants