Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use common config section for logging #80

Merged
merged 14 commits into from
Sep 18, 2023
Merged

Conversation

smoors
Copy link
Collaborator

@smoors smoors commented Aug 21, 2023

changes in this PR:

  • use common config section for logging
  • fix mympirun launcher for hortense
  • use reframe options --prefix and --save-log-files to make sure all output goes into the prefix
  • support overriding the prefix for file logs with RFM_PREFIX

fixes #64
fixes #61

@boegel boegel added this to the 0.1.0 milestone Aug 25, 2023
@boegel
Copy link
Contributor

boegel commented Aug 25, 2023

@smoors Let's sync this with current main and also update config/aws_citc.py (cfr. #53 which got merged yesterday)

boegel
boegel previously requested changes Aug 25, 2023
config/izum_vega.py Show resolved Hide resolved
@laraPPr
Copy link
Collaborator

laraPPr commented Aug 28, 2023

I did the testing on the aws cluster and hortense and the updated config works.

I also found that in some configs the staged directory is cleaned and in others it is kept. Do we want this to be uniform?

I have one unrelated suggestion that might easily be added to this pr. For the TensorFlow tests on hortense the following needs to be added to processor of every partition: 'num_cpus_per_core': 1,. This is a parameter is used in the TensorFlow tests.

config/github_actions.py Outdated Show resolved Hide resolved
@casparvl
Copy link
Collaborator

I have one unrelated suggestion that might easily be added to this pr. For the TensorFlow tests on hortense the following needs to be added to processor of every partition: 'num_cpus_per_core': 1,. This is a parameter is used in the TensorFlow tests.

This should be present if autodetecting processor information. Rather than adding num_cpus_per_core to the hortense config, I'd suggest adding CPU autodetection to it :) But it is indeed outside of the scope of this PR :)

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't really reviewed the code, but did test runs.

Runs fine on Vega and AWS.

Had issues on Snellius (/var/spool/slurm/slurmd/job3551158/slurm_script: line 11: /cvmfs/pilot.eessi-hpc.org/latest/init/bash: Too many levels of symbolic links), but they seem unrelated to the config (get the same issue when running from main branch). So no blocker for this PR, just means I can't properly verify the integrity of the config there at this time.

},
{
'type': 'file',
'name': 'reframe.log',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this log file does not end up in the prefix set for the system. While --save-log-files copies this to the root of the output directory (which by default will be in <prefix>/output), it's not very nice as it also leaves a log file in the current working directory.
A much cleaner structure would be:

<prefix>/output
<prefix>/staging
<prefix>/perflogs
<prefix>/logs

The first three are created this way. If we can convince the file log to end up in <prefix>/logs, we are done.

We discussed options on how to do this. The idea was that common_logging_config should probably be a function that you can pass the prefix as an argument. Then, we can set

'name': '<prefix_arg>/logs/reframe.log'

The function then returns the common_logging_config.

Copy link
Collaborator Author

@smoors smoors Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the downside of this approach is that if the user specifies the prefix via --prefix, that will be ignored. we can check for RFM_PREFIX though.

i'll open an issue about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now implemented.
you can override the prefix specified in the config file with RFM_PREFIX, but as noted above, --prefix does not work for file logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work like a charm when RFM_PREFIX is not specified. Now testing with RFM_PREFIX and waiting for the runs to complete to check that the perflogs end up in the right place. System is quite busy though, so waiting in the queue...

config/surf_snellius.py Outdated Show resolved Hide resolved
eessi/testsuite/common_config.py Show resolved Hide resolved
Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Tested on Snellius and Vega, logs end up where you expect them to. Nice improvement! :)

@casparvl casparvl dismissed boegel’s stale review September 18, 2023 15:48

Changes were implemented

@casparvl casparvl merged commit b1bc87d into EESSI:main Sep 18, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants