Restructure the bufr sounding job #2853

BoCui-NOAA · 2024-08-21T20:10:17Z

Description

The current operational BUFR job begins concurrently with the GFS model run. This PR updates the script and ush to process all forecast hour data simultaneously, then combines the temporary outputs to create BUFR sounding products for each station. The updated job will now start processing data only after the GFS model completes its 180-hour run, handling all forecast files from 000hr to 180hr at a time. The new version job running will need 7 nodes instead of the current operational 4 nodes.

This PR depends on the GFS bufr code update NOAA-EMC/gfs-utils#75

With the updates of bufr codes and scripts, there is no need to add restart capability to GFS post-process job JGFS_ATMOS_POSTSND.

This PR includes the other changes:

Rename the following table files:

parm/product/bufr_ij13km.txt to parm/product/bufr_ij_gfs_C768.txt
parm/product/bufr_ij9km.txt to parm/product/bufr_ij_gfs_C1152.txt

Add a new table file: parm/product/bufr_ij_gfs_C96.txt for GFSv17 C96 testing.

Added a new capability to the BUFR package. The job priority is to read bufr_ij_gfs_${CASE}.txt. If the table file is not available, the code will automatically find the nearest neighbor grid point (i, j).

Refs #1257
Refs NOAA-EMC/gfs-utils#75

Type of change

Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

Is this a breaking change (a change in existing functionality)? NO
Does this change require a documentation update? NO
Does this change require an update to any of the following submodules YES (If YES, please add a link to any PRs that are pending.)
- GFS-utils

How has this been tested?

Cycled test on WCOSS2

Checklist

Any dependent changes have been merged and published
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing tests pass with my changes
I have made corresponding changes to the documentation if necessary

scripts/exgfs_atmos_postsnd.sh

ush/gfs_bufr.sh

scripts/exgfs_atmos_postsnd.sh

This PR includes the follow changes to bufr sounding codes: 1. added the function to judge and process 3D soil variables, which are new outputs from GFSv17 2. modified the code to process forecast hour individually 3. code clean-up and removal of nemsio input files that are not used anymore 4. added a new module modpr_module.f90, which is a simplified version of sigio_module.f 5. removed linking with 'nemsio' and 'sigio' library in CMakeLists.txt With the updates of bufr codes and scripts, there is no need to add restart capability to GFS post-process job JGFS_ATMOS_POSTSND. The related bufr job script update is another PR NOAA-EMC/global-workflow#2853 Refs NOAA-EMC/global-workflow#1257 Refs NOAA-EMC/global-workflow#2853

WalterKolczynski-NOAA

I don't see an updated gfs_utils hash in this PR. I can help if you don't know how to commit that.

scripts/exgfs_atmos_postsnd.sh

WalterKolczynski-NOAA · 2024-08-28T00:51:15Z

scripts/exgfs_atmos_postsnd.sh

+# allocate 21 processes per node
+# don't allocate more processes, or it might have memory issue
+num_ppn=21
+export APRUN="mpiexec -np ${num_hours} -ppn ${num_ppn} --cpu-bind core cfp "
+
+if [ -s "${DATA}/poescript_bufr" ]; then
+  rm ${DATA}/poescript_bufr
+fi


We have a utility script, ush/run_mpmh.sh, that handles setting up an MPMD job now. That is the preferred method, as it correctly handles both slurm and pbs/torque. You just need to give it the file with your list of commands as an argument. See the atmos products ex-script for an example.

ush/run_mpmd.sh, the mpiexec command misses the setting of the process number per node in bufr job exgfs_atmos.postsnd.sh.. Will there be any update for the run_mpmd.sh in the future?

I would try without the ppn setting first to confirm it is actually an issue (ideally the MPMD tasks should be equally distributed across all nodes anyway). If it is still required, an entry should be added to the env script on any machine where it is necessary to update the mpmd_opt setting to include -ppn for the sounding job.

I tested the bufr job using run_mpmd.sh without setting the ppn parameter, and the job failed. After adding the ppn setting, the bufr job completed successfully. The PBS setting in my jobcard is:
#PBS -l place=vscatter,select=7:ncpus=128:mpiprocs=128

Please let me know if I’m wrong.

See new review. It should maintain the ppn setting while switching to run_mpmd.sh.

scripts/exgfs_atmos_postsnd.sh

ush/gfs_bufr.sh

BoCui-NOAA · 2024-08-28T17:58:28Z

I don't see an updated gfs_utils hash in this PR. I can help if you don't know how to commit that.

Yes, please tell me where to update gfs_utils hash and how to commit this. Thanks!

WalterKolczynski-NOAA · 2024-08-29T02:21:36Z

I don't see an updated gfs_utils hash in this PR. I can help if you don't know how to commit that.

Yes, please tell me where to update gfs_utils hash and how to commit this. Thanks!

In your global-workflow clone, go to the sorc/gfs_utils.fd directory and checkout the appropriate hash. Then go back up and do a git add sorc/gfs_utils.fd and commit.

BoCui-NOAA · 2024-08-29T18:27:49Z

I don't see an updated gfs_utils hash in this PR. I can help if you don't know how to commit that.

Yes, please tell me where to update gfs_utils hash and how to commit this. Thanks!

In your global-workflow clone, go to the sorc/gfs_utils.fd directory and checkout the appropriate hash. Then go back up and do a git add sorc/gfs_utils.fd and commit.

Thanks, just committed using the updated gfs-utils hash.

env/WCOSS2.env

scripts/exgfs_atmos_postsnd.sh

WalterKolczynski-NOAA · 2024-09-04T02:44:18Z

scripts/exgfs_atmos_postsnd.sh

+# allocate 21 processes per node
+# don't allocate more processes, or it might have memory issue
+num_ppn=21
+export APRUN="mpiexec -np ${num_hours} -ppn ${num_ppn} --cpu-bind core cfp "
+
+if [ -s "${DATA}/poescript_bufr" ]; then
+  rm ${DATA}/poescript_bufr
+fi


See new review. It should maintain the ppn setting while switching to run_mpmd.sh.

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>

emcbot · 2024-09-06T16:28:09Z

CI Update on Wcoss2 at 09/06/24 04:28:09 PM
============================================
Cloning and Building global-workflow PR: 2853
with PID: 182099 on host: dlogin03

BoCui-NOAA · 2024-09-06T16:45:17Z

@WalterKolczynski-NOAA I committed some changes 5 minutes ago. I didn't realize you have approved the PR. Should I submit a new PR? My changes include:

Rename the following table files:

parm/product/bufr_ij13km.txt to bufr_ij_gfs_C768.txt
parm/product/bufr_ij9km.txt to bufr_ij_gfs_C1152.txt

Add a new table file: parm/product/bufr_ij_gfs_C96.txt for GFSv17 C96 testing.

Added a new capability to the BUFR package. The job priority is to read bufr_ij_gfs_${CASE}.txt. If the table file is not available, the code will automatically find the nearest neighbor grid point (i, j).

WalterKolczynski-NOAA · 2024-09-06T17:00:27Z

@WalterKolczynski-NOAA I committed some changes 5 minutes ago. I didn't realize you have approved the PR. Should I submit a new PR? My changes include:

Rename the following table files:
parm/product/bufr_ij13km.txt to bufr_ij_gfs_C768.txt
parm/product/bufr_ij9km.txt to bufr_ij_gfs_C1152.txt
Add a new table file: parm/product/bufr_ij_gfs_C96.txt for GFSv17 C96 testing.

Added a new capability to the BUFR package. The job priority is to read bufr_ij_gfs_${CASE}.txt. If the table file is not available, the code will automatically find the nearest neighbor grid point (i, j).

No, this PR is fine. I'm going to restart the CI, since it will likely fail anyway.

emcbot · 2024-09-06T17:04:09Z

CI Update on Wcoss2 at 09/06/24 05:04:08 PM
=================================================
PR:2853 Reset to Wcoss2-Ready by user and is now restarting CI tests
Driver PID: Requested termination of 182099 and children on dlogin03
Driver PID: has restarted as 109597 on dlogin03
No current experiments to cancel in PR: 2853 on Wcoss2

emcbot · 2024-09-06T17:04:49Z

CI Update on Wcoss2 at 09/06/24 05:04:48 PM
============================================
Cloning and Building global-workflow PR: 2853
with PID: 109597 on host: dlogin03

BoCui-NOAA · 2024-09-06T17:37:28Z

@WalterKolczynski-NOAA I have a question for the ppn setting, which is set to 21 now. This setting is based on running the C768 resolution. However, it may need adjustment for the C1152. Can we make this number flexible according to the resolution? The total number of nodes will also need to be adjusted.

WalterKolczynski-NOAA · 2024-09-06T17:40:34Z

@WalterKolczynski-NOAA I have a question for the ppn setting, which is set to 21 now. This setting is based on running the C768 resolution. However, it may need adjustment for the C1152. Can we make this number flexible according to the resolution? The total number of nodes will also need to be adjusted.

Yes, but let's explore that further and make a follow-up PR for that.

BoCui-NOAA · 2024-09-06T17:42:36Z

@WalterKolczynski-NOAA I have a question for the ppn setting, which is set to 21 now. This setting is based on running the C768 resolution. However, it may need adjustment for the C1152. Can we make this number flexible according to the resolution? The total number of nodes will also need to be adjusted.

Yes, but let's explore that further and make a follow-up PR for that.

Sure. Thanks for the comments.

emcbot · 2024-09-06T17:49:58Z

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Fri Sep  6 17:08:47 UTC 2024 on dlogin03
---------------------------------------------------
Build: Completed at 09/06/24 05:49:35 PM
Case setup: Completed for experiment C48_ATM_5f5e542f
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_5f5e542f
Case setup: Skipped for experiment C48_S2SWA_gefs_5f5e542f
Case setup: Completed for experiment C48_S2SW_5f5e542f
Case setup: Completed for experiment C96_atm3DVar_extended_5f5e542f
Case setup: Skipped for experiment C96_atm3DVar_5f5e542f
Case setup: Completed for experiment C96_atmaerosnowDA_5f5e542f
Case setup: Completed for experiment C96C48_hybatmDA_5f5e542f
Case setup: Completed for experiment C96C48_ufs_hybatmDA_5f5e542f

emcbot · 2024-09-07T07:45:14Z

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_5f5e542f *** SUCCESS *** at 09/06/24 07:21:24 PM
Experiment C48_S2SW_5f5e542f *** SUCCESS *** at 09/06/24 07:33:18 PM
Experiment C96C48_hybatmDA_5f5e542f *** SUCCESS *** at 09/06/24 08:36:30 PM
Experiment C96_atmaerosnowDA_5f5e542f *** SUCCESS *** at 09/06/24 09:33:29 PM
Experiment C96C48_ufs_hybatmDA_5f5e542f *** SUCCESS *** at 09/06/24 09:57:19 PM
Experiment C96_atm3DVar_extended_5f5e542f *** SUCCESS *** at 09/07/24 07:42:35 AM

BoCui-NOAA · 2024-09-07T18:55:49Z

Great, thanks @WalterKolczynski-NOAA

DavidHuber-NOAA · 2024-09-09T13:04:27Z

env/WCOSS2.env

@@ -223,7 +223,7 @@ elif [[ "${step}" = "postsnd" ]]; then
    export OMP_NUM_THREADS=1

    export NTHREADS_POSTSND=${NTHREADS1}
-    export APRUN_POSTSND="${APRUN} --depth=${NTHREADS_POSTSND} --cpu-bind depth"
+    export mpmd_opt="-ppn 21 ${mpmd_opt}"


@BoCui-NOAA @WalterKolczynski-NOAA Was it intentional to get rid of APRUN_POSTSND? This is referenced by ush/gfs_bufr_netcdf.sh.

Yes, the 'APRUN_POSTSND' setting is no longer needed, and the ush/gfs_bufr_netcdf.sh script is no longer in use.

* origin/develop: Create JEDI class (NOAA-EMC#2805) Restructure the bufr sounding job (NOAA-EMC#2853) Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816) Reenable Orion Cycling Support (NOAA-EMC#2877) Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893) Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888) Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885) Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861) Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876) Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738) Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868) Support coupling on AWS (NOAA-EMC#2859) Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833) Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857) Support ATM forecast only on Google (NOAA-EMC#2832) Add GEFS C48 support on AWS (NOAA-EMC#2818) Update omega calculation (NOAA-EMC#2751) Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690) support ATM forecast only on Azure (NOAA-EMC#2827) Convert staging job to python and yaml (NOAA-EMC#2651) Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)

BoCui-NOAA and others added 4 commits June 20, 2024 14:34

Add restart capability to job JGFS_ATMOS_POSTSND

c734f41

Merge branch 'NOAA-EMC:develop' into develop

bb89562

Merge branch 'NOAA-EMC:develop' into develop

8e42d49

Convert bufr to handle one forecast at a time

b0637b7

github-advanced-security bot found potential problems Aug 21, 2024

View reviewed changes

update to fix shellcheck warning

c4f0399

github-advanced-security bot found potential problems Aug 21, 2024

View reviewed changes

scripts/exgfs_atmos_postsnd.sh Fixed Show fixed Hide fixed

scripts/exgfs_atmos_postsnd.sh Fixed Show fixed Hide fixed

fix shellcheck issue

f711c3f

BoCui-NOAA mentioned this pull request Aug 22, 2024

Update bufr codes to handle one forecast at a time NOAA-EMC/gfs-utils#75

Merged

7 tasks

WalterKolczynski-NOAA requested changes Aug 28, 2024

View reviewed changes

Updated gfs_utils hash

202c449

20240903 gfsv17 bufr package update

133755a

WalterKolczynski-NOAA requested changes Sep 4, 2024

View reviewed changes

BoCui-NOAA and others added 2 commits September 4, 2024 13:06

Update env/WCOSS2.env

e293370

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>

Update scripts/exgfs_atmos_postsnd.sh

0e94d92

Co-authored-by: Walter Kolczynski - NOAA <[email protected]>

WalterKolczynski-NOAA previously approved these changes Sep 6, 2024

View reviewed changes

WalterKolczynski-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Sep 6, 2024

emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 6, 2024

Add new bufr table for GFS C96 output

5f5e542

BoCui-NOAA dismissed WalterKolczynski-NOAA’s stale review via 5f5e542 September 6, 2024 16:37

WalterKolczynski-NOAA approved these changes Sep 6, 2024

View reviewed changes

WalterKolczynski-NOAA removed the CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS label Sep 6, 2024

WalterKolczynski-NOAA added the CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS label Sep 6, 2024

emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Sep 6, 2024

emcbot added CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Sep 6, 2024

emcbot added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Sep 7, 2024

WalterKolczynski-NOAA approved these changes Sep 7, 2024

View reviewed changes

WalterKolczynski-NOAA merged commit b8080cd into NOAA-EMC:develop Sep 7, 2024
5 checks passed

DavidHuber-NOAA reviewed Sep 9, 2024

View reviewed changes

BoCui-NOAA mentioned this pull request Oct 1, 2024

[NCO Bug] Add restart capability for GFS atmos post processing jobs #1257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure the bufr sounding job #2853

Restructure the bufr sounding job #2853

BoCui-NOAA commented Aug 21, 2024 •

edited

Loading

WalterKolczynski-NOAA left a comment

WalterKolczynski-NOAA Aug 28, 2024

BoCui-NOAA Aug 29, 2024

WalterKolczynski-NOAA Aug 29, 2024 •

edited

Loading

BoCui-NOAA Aug 29, 2024

WalterKolczynski-NOAA Sep 4, 2024 •

edited

Loading

BoCui-NOAA commented Aug 28, 2024

WalterKolczynski-NOAA commented Aug 29, 2024

BoCui-NOAA commented Aug 29, 2024

WalterKolczynski-NOAA Sep 4, 2024 •

edited

Loading

emcbot commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

WalterKolczynski-NOAA commented Sep 6, 2024

emcbot commented Sep 6, 2024

emcbot commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

WalterKolczynski-NOAA commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

emcbot commented Sep 6, 2024

emcbot commented Sep 7, 2024

BoCui-NOAA commented Sep 7, 2024

DavidHuber-NOAA Sep 9, 2024

BoCui-NOAA Sep 9, 2024

Restructure the bufr sounding job #2853

Restructure the bufr sounding job #2853

Conversation

BoCui-NOAA commented Aug 21, 2024 • edited Loading

Description

Type of change

Change characteristics

How has this been tested?

Checklist

WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

WalterKolczynski-NOAA Aug 28, 2024

Choose a reason for hiding this comment

BoCui-NOAA Aug 29, 2024

Choose a reason for hiding this comment

WalterKolczynski-NOAA Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

BoCui-NOAA Aug 29, 2024

Choose a reason for hiding this comment

WalterKolczynski-NOAA Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

BoCui-NOAA commented Aug 28, 2024

WalterKolczynski-NOAA commented Aug 29, 2024

BoCui-NOAA commented Aug 29, 2024

WalterKolczynski-NOAA Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

emcbot commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

WalterKolczynski-NOAA commented Sep 6, 2024

emcbot commented Sep 6, 2024

emcbot commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

WalterKolczynski-NOAA commented Sep 6, 2024

BoCui-NOAA commented Sep 6, 2024

emcbot commented Sep 6, 2024

emcbot commented Sep 7, 2024

BoCui-NOAA commented Sep 7, 2024

DavidHuber-NOAA Sep 9, 2024

Choose a reason for hiding this comment

BoCui-NOAA Sep 9, 2024

Choose a reason for hiding this comment

BoCui-NOAA commented Aug 21, 2024 •

edited

Loading

WalterKolczynski-NOAA Aug 29, 2024 •

edited

Loading

WalterKolczynski-NOAA Sep 4, 2024 •

edited

Loading

WalterKolczynski-NOAA Sep 4, 2024 •

edited

Loading