Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build script and move build-related files to sorc dir #49

Merged
merged 53 commits into from
Feb 27, 2024

Conversation

chan-hoo
Copy link
Collaborator

@chan-hoo chan-hoo commented Feb 13, 2024

Description

  • All build-related files and directories are moved to a new directory sorc to meet the NCO implementation standards.
  • Conda environment is built in the build script.
  • A build script is added:
cd land-DA_workflow/sorc
./app_build.sh

To load conda environment:

module use modulefiles
module load wflow_hera
conda activate land_da

To run a sample xml file:

cd parm
(modify the xml file as needed : ACCOUNT) 
rocotorun -w land_analysis_test.xml -d land_analysis_test.db

To generate a xml file from the yaml file using uwtools:

cd parm
(modify the yaml file) 
uw rocoto realize --input-file land_analysis.yaml --output-file land_analysis.xml
rocotorun -w land_analysis.xml -d land_analysis.db

Anticipated changes to regression tests:

  • Is baseline change expected ?

Subcomponents involved:

  • DA_update
  • vector2tile
  • ufs-land-driver
  • none

Linked PR's and Issues:

Testing (for CM's):

  • RDHPCS
    • Hera
    • Orion
    • Jet
    • Gaea
    • Cheyenne
  • CI
    • Completed
  • PW-Clouds
    • AWS
    • AZURE
    • GCP

@chan-hoo
Copy link
Collaborator Author

@christinaholtNOAA, I integrated conda and uwtools into the build script. I copied the part from the build script of the UFS SRW App. I didn't have any errors in building the env, but I got the following error when I ran uw rocoto realize:

(land_da) [Chan-hoo.Jeon@hfe01 parm]$ uw rocoto realize --input-file land_analysis.yaml --output-file land_analysis.xml
[Errno 2] No such file or directory: '/scratch2/NCEPDEV/fv3-cam/Chan-hoo.Jeon/LAND-DA-WORKFLOW/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.11/site-packages/uwtools/resources/rocoto.jsonschema'

When I checked the uwtools repo, I found that some files were not copied to the above path:
https://github.com/ufs-community/uwtools/tree/2.0.1/src/uwtools/resources

Can you let me know how to fix this issue?

@jkbk2004
Copy link
Collaborator

@christinaholtNOAA, I integrated conda and uwtools into the build script. I copied the part from the build script of the UFS SRW App. I didn't have any errors in building the env, but I got the following error when I ran uw rocoto realize:

(land_da) [Chan-hoo.Jeon@hfe01 parm]$ uw rocoto realize --input-file land_analysis.yaml --output-file land_analysis.xml
[Errno 2] No such file or directory: '/scratch2/NCEPDEV/fv3-cam/Chan-hoo.Jeon/LAND-DA-WORKFLOW/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.11/site-packages/uwtools/resources/rocoto.jsonschema'

When I checked the uwtools repo, I found that some files were not copied to the above path: https://github.com/ufs-community/uwtools/tree/2.0.1/src/uwtools/resources

Can you let me know how to fix this issue?

@WeirAE FYI: this might be an item to pick with next patch version.

@jkbk2004
Copy link
Collaborator

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, updated, but the ufsLand ctest still fails:

+ /apps/oneapi/mpi/2021.5.1/bin/mpiexec -n 1 /scratch2/NCEPDEV/fv3-cam/Chan-hoo.Jeon/ctest_landda/land-DA_workflow/sorc/build/bin/ufsLand.exe
MPI startup(): Warning: I_MPI_PMI_LIBRARY will be ignored since the hydra process manager was found
 No such file or directory
Stopped

It is saying that some input files are missing, but I can't find which one is missing yet.
Any idea?

@jkbk2004
Copy link
Collaborator

(base) jongkim@Orion-login-2:/work/noaa/epic/jongkim/land-DA/merge/pr-49-v1/build$ ctest
Test project /work/noaa/epic/jongkim/land-DA/merge/pr-49-v1/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed   10.75 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed   10.08 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed   63.07 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    5.44 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed   15.15 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................   Passed   11.11 sec
    Start 7: test_ufs_datm_land
7/7 Test #7: test_ufs_datm_land ...............   Passed  585.38 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 701.00 sec

@jkbk2004
Copy link
Collaborator

@chan-hoo can you try to build one level up not inside sorc as ./app_build.sh -c=intel --conda=off --build --build-dir=/work/noaa/epic/jongkim/land-DA/merge/pr-49-v1/build ?

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, It passed all ctests on Orion:

Test project /work/noaa/epic/chjeon/land_da_workflow/land-DA_workflow/sorc/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed    7.39 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed    9.92 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed   58.22 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    5.22 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed   11.95 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................   Passed   11.14 sec
    Start 7: test_ufs_datm_land
7/7 Test #7: test_ufs_datm_land ...............   Passed  605.77 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 709.63 sec

I think the input files (/scratch2/NAGAPE/epic/UFS_Land-DA/inputs) are incomplete on Hera.
I'll open a new issue for this soon.
The following PR should be merged first:
ufs-community/land-DA#5

@jkbk2004
Copy link
Collaborator

(base) [role.epic@Hera:/scratch1/NCEPDEV/nems/role.epic/testing/landda-2024/pr-49/build]# ctest
Test project /scratch1/NCEPDEV/nems/role.epic/testing/landda-2024/pr-49/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed    4.24 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed    5.66 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed   25.99 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    1.84 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed    1.90 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................   Passed    7.52 sec
    Start 7: test_ufs_datm_land
7/7 Test #7: test_ufs_datm_land ...............   Passed   56.41 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) = 103.62 sec

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, one test still fails on Hera on my end:

Test project /scratch2/NAGAPE/epic/Chan-hoo.Jeon/landda_ctest/land-DA_workflow/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed    0.98 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed    1.34 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed    4.98 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    1.22 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed    1.47 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................***Failed    1.09 sec
    Start 7: test_ufs_datm_land
7/7 Test #7: test_ufs_datm_land ...............   Passed   45.88 sec

Since your test was completed successfully, I guess there may be a file permission issue on the input files on Hera.

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, I found the following issue:

-rw-r--r-- 1 Rhaesung.Kim epic 7991744 Aug 17  2023 ufs-land_C96_static_fields.nc

Can you run chmod -R +x inputs on /scratch2/NAGAPE/epic/UFS_Land-DA/?

cd /scratch2/NAGAPE/epic/UFS_Land-DA
chmod -R +x inputs

@jkbk2004
Copy link
Collaborator

@chan-hoo can you try to use /scratch1/NCEPDEV/nems/role.epic/testing/landda-2024/inputs ?

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, the file in the new path doesn't have x permission either:

-rw-r--r-- 1 role.epic epic 7991744 Jan 25 21:04 ufs-land_C96_static_fields.nc

@jkbk2004
Copy link
Collaborator

@jkbk2004, the file in the new path doesn't have x permission either:

-rw-r--r-- 1 role.epic epic 7991744 Jan 25 21:04 ufs-land_C96_static_fields.nc

try one more time

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, the error was gone, but another error came out:

 workflow/../inputs/forcing/era5/datm/C96//C96_ERA5_forcing_2019-12-21.
 No such file or directory

Please add x to inputs and its sub-directories with -R:

cd /scratch1/NCEPDEV/nems/role.epic/testing/landda-2024/inputs
chmod -R +x inputs

@jkbk2004
Copy link
Collaborator

forcing/era5/datm/C96/

Try one more time

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, it works now!

Test project /scratch2/NAGAPE/epic/Chan-hoo.Jeon/show_p2/land-DA_workflow/sorc/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed    7.00 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed    4.81 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed   20.22 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    1.64 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed    1.77 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................   Passed    7.79 sec
    Start 7: test_ufs_datm_land
7/7 Test #7: test_ufs_datm_land ...............   Passed   52.03 sec

100% tests passed, 0 tests failed out of 7

Thank you!

@chan-hoo
Copy link
Collaborator Author

@jkbk2004, ctest was completed successfully on Orion:

Test project /work/noaa/epic/chjeon/show_p2/land-DA_workflow/sorc/build
    Start 1: test_vector2tile
1/7 Test #1: test_vector2tile .................   Passed    8.19 sec
    Start 2: test_create_ens
2/7 Test #2: test_create_ens ..................   Passed    8.72 sec
    Start 3: test_letkfoi_snowda
3/7 Test #3: test_letkfoi_snowda ..............   Passed   60.03 sec
    Start 4: test_apply_jediincr
4/7 Test #4: test_apply_jediincr ..............   Passed    4.42 sec
    Start 5: test_tile2vector
5/7 Test #5: test_tile2vector .................   Passed   12.46 sec
    Start 6: test_land_driver
6/7 Test #6: test_land_driver .................   Passed    8.24 sec
    Start 7: test_ufs_datm_land
 7/7 Test #7: test_ufs_datm_land ...............   Passed  438.21 sec

100% tests passed, 0 tests failed out of 7

@chan-hoo chan-hoo merged commit 61bff0d into ufs-community:develop Feb 27, 2024
1 check passed
@chan-hoo chan-hoo deleted the feature/nco_sorc branch February 28, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants