-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #939
[release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #939
Conversation
…e for new platforms (ufs-community#934) The option to create an experiment with the option USE_CRON_TO_RELAUNCH=True is currently broken on Cheyenne and Derecho due to some bad python logic. This fixes that issue. Also took the opportunity to update the PR template to include the new supported platforms (Derecho, Hercules, and Gaea C5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich - Similar to PR #934, these changes look good to me. I also went ahead and ran the grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16
test on Derecho using the cron and the test successfully ran and ultimately passed. After the job passed, I checked the crontab and the job had been removed.
Approving this work now.
@mkavulich - Will you be including the additional fix for removing old crontab entries on Cheyenne and Derecho in a subsequent PR to develop? I ask because when I ran the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Given that the Jenkins tests successfully passed for PR #934 and the only additional modifications were made for Cheyenne and Derecho, which aren't supported via Jenkins, I have completed running the WE2E coverage tests on Derecho and all tests successfully passed:
I will now move forward with merging this work. |
@MichaelLueken turns out there was a bad assumption in the crontab script: Derecho does not suffer from the same problem as Cheyenne where a different |
…d new winter weather verification test with staged data (#997) New test * The new test MET_ensemble_verification_winter_wx is added. This test will exercise a number of yet-untested capabilities in the workflow, including a 10-member ensemble, snowfall verification with staged data (so can be run on all platforms, not just Jet and Hera), and several SPP settings. * As part of this new test, snowfall observations will now be staged on all tier-1 platforms, as well as netCDF GFS data and other observation types, all for the date 2022020300 Resolved issues * Incorrect octal notation causing ensemble vx to fail #966 Resolved: In several locations, an explicit conversion is done to ENSMEM_INDX to ensure it is a base-10 integer, to avoid problems with bash interpreting numbers with a leading zero as octal. * Should "EXPT_SUBDIR" be a mandatory variable? #978 resolved: per discussion in a recent SRW code management meeting, give EXPT_SUBDIR a default value "experiment" to avoid unnecessary complications and work for users. Additionally, the default behavior if an experiment directory already exists is changed to "quit" rather than "delete" * Issue mentioned in this discussion; the setting fhzero=6 is removed from the weather model namelist for CCPP suite FV3_GFS_v17_p8, which allows precipitation and other accumulations to be made every hour rather than 6 hours (SRW output is always hourly, so this makes sense). Also, update diag_table.FV3_GFS_v17_p8 so that all output files will be hourly * Per discussion in [release/public-v2.2.0] Fix crontab bug for Cheyenne and Derecho, update PR template for new platforms #939, remove an unnecessary special case in get_crontab_contents.py for Derecho Other fixes * Some old files for unsupported/removed CCPP suites are removed * Add some missing task dependencies for retrieving verification obs General improvements * Many improvements to verification obs-pulling task * NDAS observations are now retrieved for forecast hour zero, and a better obs file is retrieved for major obs times (00z, 06z, 12z, 18z) per EMC guidance * Better in-line comments/documentation * Standardize order and messaging for file-on-disk checks across all observation types * Added explanatory comments for reflectivity field in diag_table files * Update diag_table.FV3_GFS_v17_p8 so that all output files will be hourly * Simplify task dependencies that rely on staged verification observations; these "get_obs" tasks should always be run (they check that the data exists before trying to retrieve it), so no need to make the dependency conditional * Add a check in monitor_jobs.py to ensure the yaml file does not contain duplicate experiment directories * Make sure the key in the experiment dictionary used by is unique by appending the current date/time to the exptdir name; additionally, set this key as the WORKFLOW_ID variable (so that it could be used in the workflow if necessary).
Note: this is identical to #934 except it contains an additional fix for removing old crontab entries on Cheyenne and Derecho
DESCRIPTION OF CHANGES:
The option to create an experiment with the option
USE_CRON_TO_RELAUNCH=True
is currently broken on Cheyenne and Derecho due to some bad python logic. This PR fixes that issue.I also took the opportunity to update the PR template to include the new supported platforms (Derecho, Hercules, and Gaea C5)
Type of change
TESTS CONDUCTED:
Ran WE2E fundamental tests with the option
--launch=cron
on three platforms. Previously failing on Cheyenne an Derecho, these tasks all succeed except for thegrid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16
test on Cheyenne: this is a pre-existing failure (see Issue #933)DEPENDENCIES:
None
DOCUMENTATION:
None
ISSUE:
Fixes #932
CHECKLIST