Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate race conditions and remove DATAROOT last in cleanup #2893

Merged
merged 3 commits into from
Sep 6, 2024

Conversation

DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Sep 5, 2024

Description

This changes the order of the cleanup job so that the working directory is deleted at the end. It also adds the -ignore_readdir_race flag to find to prevent errors if a file was deleted after the list of files was collected. This can happen if two consecutive cycles run the cleanup job at the same time.

Resolves #2880

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

A 5-cycle test on WCOSS2

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Fix/cleanup Eliminate race conditions in cleanup Sep 5, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Eliminate race conditions in cleanup Eliminate race conditions and remove DATAROOT last in cleanup Sep 5, 2024
aerorahul
aerorahul previously approved these changes Sep 5, 2024
Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
I found a suggestion online to use sync that ensures all modifications are synchronized.
It should help at the least, but look it up and validate it.

Copy link

@XuanliLi-NOAA XuanliLi-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I had removed the files in my directories due to the disk quota limit, but Sean Casey at QOSAP tested the changes (including those suggested by @aerorahul) on Hera and confirmed that they solved the problem.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Sep 5, 2024
@emcbot
Copy link

emcbot commented Sep 5, 2024

CI Update on Wcoss2 at 09/05/24 08:40:17 PM
=================================================
PR:2893 Reset to Wcoss2-Ready by user and is now restarting CI tests
No current experiments to cancel in PR: 2893 on Wcoss2

@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 5, 2024
@emcbot
Copy link

emcbot commented Sep 5, 2024

CI Update on Wcoss2 at 09/05/24 08:40:21 PM
============================================
Cloning and Building global-workflow PR: 2893
with PID: 146574 on host: dlogin03

@emcbot emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Sep 5, 2024
@emcbot
Copy link

emcbot commented Sep 5, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Thu Sep  5 20:43:11 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/05/24 09:27:58 PM
Case setup: Completed for experiment C48_ATM_b502db38
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_b502db38
Case setup: Skipped for experiment C48_S2SWA_gefs_b502db38
Case setup: Completed for experiment C48_S2SW_b502db38
Case setup: Completed for experiment C96_atm3DVar_extended_b502db38
Case setup: Skipped for experiment C96_atm3DVar_b502db38
Case setup: Completed for experiment C96C48_hybatmaerosnowDA_b502db38
Case setup: Completed for experiment C96C48_hybatmDA_b502db38
Case setup: Completed for experiment C96C48_ufs_hybatmDA_b502db38

@emcbot emcbot added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Sep 6, 2024
@emcbot
Copy link

emcbot commented Sep 6, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_b502db38 *** SUCCESS *** at 09/05/24 10:57:10 PM
Experiment C48_S2SW_b502db38 *** SUCCESS *** at 09/05/24 11:00:17 PM
Experiment C96C48_hybatmDA_b502db38 *** SUCCESS *** at 09/06/24 12:06:41 AM
Experiment C96C48_hybatmaerosnowDA_b502db38 *** SUCCESS *** at 09/06/24 01:03:33 AM
Experiment C96C48_ufs_hybatmDA_b502db38 *** SUCCESS *** at 09/06/24 01:51:25 AM
Experiment C96_atm3DVar_extended_b502db38 *** SUCCESS *** at 09/06/24 10:42:49 AM

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 6519211 into NOAA-EMC:develop Sep 6, 2024
5 checks passed
DavidHuber-NOAA added a commit to DavidHuber-NOAA/global-workflow that referenced this pull request Sep 9, 2024
* origin/develop:
  Create JEDI class (NOAA-EMC#2805)
  Restructure the bufr sounding job    (NOAA-EMC#2853)
  Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816)
  Reenable Orion Cycling Support (NOAA-EMC#2877)
  Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893)
  Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888)
  Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885)
  Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861)
  Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876)
  Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738)
  Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868)
  Support coupling on AWS (NOAA-EMC#2859)
  Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833)
  Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857)
  Support ATM forecast only on Google (NOAA-EMC#2832)
  Add GEFS C48 support on AWS (NOAA-EMC#2818)
  Update omega calculation (NOAA-EMC#2751)
  Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690)
  support ATM forecast only on Azure (NOAA-EMC#2827)
  Convert staging job to python and yaml (NOAA-EMC#2651)
  Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gdascleanup and enkfgdascleanup failures
5 participants