-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure the bufr sounding job #2853
Conversation
This PR includes the follow changes to bufr sounding codes: 1. added the function to judge and process 3D soil variables, which are new outputs from GFSv17 2. modified the code to process forecast hour individually 3. code clean-up and removal of nemsio input files that are not used anymore 4. added a new module modpr_module.f90, which is a simplified version of sigio_module.f 5. removed linking with 'nemsio' and 'sigio' library in CMakeLists.txt With the updates of bufr codes and scripts, there is no need to add restart capability to GFS post-process job JGFS_ATMOS_POSTSND. The related bufr job script update is another PR NOAA-EMC/global-workflow#2853 Refs NOAA-EMC/global-workflow#1257 Refs NOAA-EMC/global-workflow#2853
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an updated gfs_utils hash in this PR. I can help if you don't know how to commit that.
scripts/exgfs_atmos_postsnd.sh
Outdated
# allocate 21 processes per node | ||
# don't allocate more processes, or it might have memory issue | ||
num_ppn=21 | ||
export APRUN="mpiexec -np ${num_hours} -ppn ${num_ppn} --cpu-bind core cfp " | ||
|
||
if [ -s "${DATA}/poescript_bufr" ]; then | ||
rm ${DATA}/poescript_bufr | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a utility script, ush/run_mpmh.sh, that handles setting up an MPMD job now. That is the preferred method, as it correctly handles both slurm and pbs/torque. You just need to give it the file with your list of commands as an argument. See the atmos products ex-script for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ush/run_mpmd.sh, the mpiexec command misses the setting of the process number per node in bufr job exgfs_atmos.postsnd.sh.. Will there be any update for the run_mpmd.sh in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try without the ppn setting first to confirm it is actually an issue (ideally the MPMD tasks should be equally distributed across all nodes anyway). If it is still required, an entry should be added to the env script on any machine where it is necessary to update the mpmd_opt
setting to include -ppn
for the sounding job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the bufr job using run_mpmd.sh without setting the ppn parameter, and the job failed. After adding the ppn setting, the bufr job completed successfully. The PBS setting in my jobcard is:
#PBS -l place=vscatter,select=7:ncpus=128:mpiprocs=128
Please let me know if I’m wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See new review. It should maintain the ppn setting while switching to run_mpmd.sh
.
Yes, please tell me where to update gfs_utils hash and how to commit this. Thanks! |
In your global-workflow clone, go to the |
Thanks, just committed using the updated gfs-utils hash. |
scripts/exgfs_atmos_postsnd.sh
Outdated
# allocate 21 processes per node | ||
# don't allocate more processes, or it might have memory issue | ||
num_ppn=21 | ||
export APRUN="mpiexec -np ${num_hours} -ppn ${num_ppn} --cpu-bind core cfp " | ||
|
||
if [ -s "${DATA}/poescript_bufr" ]; then | ||
rm ${DATA}/poescript_bufr | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See new review. It should maintain the ppn setting while switching to run_mpmd.sh
.
Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
Co-authored-by: Walter Kolczynski - NOAA <[email protected]>
|
@WalterKolczynski-NOAA I committed some changes 5 minutes ago. I didn't realize you have approved the PR. Should I submit a new PR? My changes include: Rename the following table files:
Add a new table file: parm/product/bufr_ij_gfs_C96.txt for GFSv17 C96 testing. Added a new capability to the BUFR package. The job priority is to read bufr_ij_gfs_${CASE}.txt. If the table file is not available, the code will automatically find the nearest neighbor grid point (i, j). |
No, this PR is fine. I'm going to restart the CI, since it will likely fail anyway. |
|
|
@WalterKolczynski-NOAA I have a question for the ppn setting, which is set to 21 now. This setting is based on running the C768 resolution. However, it may need adjustment for the C1152. Can we make this number flexible according to the resolution? The total number of nodes will also need to be adjusted. |
Yes, but let's explore that further and make a follow-up PR for that. |
Sure. Thanks for the comments. |
Automated global-workflow Testing Results:
|
All CI Test Cases Passed on Wcoss2:
|
Great, thanks @WalterKolczynski-NOAA |
@@ -223,7 +223,7 @@ elif [[ "${step}" = "postsnd" ]]; then | |||
export OMP_NUM_THREADS=1 | |||
|
|||
export NTHREADS_POSTSND=${NTHREADS1} | |||
export APRUN_POSTSND="${APRUN} --depth=${NTHREADS_POSTSND} --cpu-bind depth" | |||
export mpmd_opt="-ppn 21 ${mpmd_opt}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BoCui-NOAA @WalterKolczynski-NOAA Was it intentional to get rid of APRUN_POSTSND
? This is referenced by ush/gfs_bufr_netcdf.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the 'APRUN_POSTSND' setting is no longer needed, and the ush/gfs_bufr_netcdf.sh script is no longer in use.
* origin/develop: Create JEDI class (NOAA-EMC#2805) Restructure the bufr sounding job (NOAA-EMC#2853) Add an archive task to GEFS system to archive files locally (NOAA-EMC#2816) Reenable Orion Cycling Support (NOAA-EMC#2877) Eliminate race conditions and remove DATAROOT last in cleanup (NOAA-EMC#2893) Update aerosol climatology to 2013-2024 mean (NOAA-EMC#2888) Add ability to run CI test C96_atm3DVar.yaml to Gaea-C5 (NOAA-EMC#2885) Support global-workflow GEFS C48 on Google Cloud (NOAA-EMC#2861) Add 3 and 9 hr increment files to IC staging (NOAA-EMC#2876) Add diffusion/diag B for aerosol DA and some other needed changes (NOAA-EMC#2738) Correct ocean `MOM.res_#` stage copy (NOAA-EMC#2868) Support coupling on AWS (NOAA-EMC#2859) Add JEDI ATM lgetkf observer and solver jobs (NOAA-EMC#2833) Fix gdas build on Gaea and add Gaea to available CI list (NOAA-EMC#2857) Support ATM forecast only on Google (NOAA-EMC#2832) Add GEFS C48 support on AWS (NOAA-EMC#2818) Update omega calculation (NOAA-EMC#2751) Add snow DA update and recentering for the EnKF forecasts (NOAA-EMC#2690) support ATM forecast only on Azure (NOAA-EMC#2827) Convert staging job to python and yaml (NOAA-EMC#2651) Fixed test on UNAVAILBLE in python Rocoto check (NOAA-EMC#2842)
Description
The current operational BUFR job begins concurrently with the GFS model run. This PR updates the script and ush to process all forecast hour data simultaneously, then combines the temporary outputs to create BUFR sounding products for each station. The updated job will now start processing data only after the GFS model completes its 180-hour run, handling all forecast files from 000hr to 180hr at a time. The new version job running will need 7 nodes instead of the current operational 4 nodes.
This PR depends on the GFS bufr code update NOAA-EMC/gfs-utils#75
With the updates of bufr codes and scripts, there is no need to add restart capability to GFS post-process job JGFS_ATMOS_POSTSND.
This PR includes the other changes:
Rename the following table files:
parm/product/bufr_ij13km.txt to parm/product/bufr_ij_gfs_C768.txt
parm/product/bufr_ij9km.txt to parm/product/bufr_ij_gfs_C1152.txt
Add a new table file: parm/product/bufr_ij_gfs_C96.txt for GFSv17 C96 testing.
Added a new capability to the BUFR package. The job priority is to read bufr_ij_gfs_${CASE}.txt. If the table file is not available, the code will automatically find the nearest neighbor grid point (i, j).
Refs #1257
Refs NOAA-EMC/gfs-utils#75
Type of change
Change characteristics
Is this a breaking change (a change in existing functionality)? NO
Does this change require a documentation update? NO
Does this change require an update to any of the following submodules YES (If YES, please add a link to any PRs that are pending.)
How has this been tested?
Checklist