Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[production/RRFS.v1] saSAS sigmab initialization changes and and inline post bug fix for RUC LSM #2488

Merged
merged 6 commits into from
Jan 28, 2025

Conversation

JiliDong-NOAA
Copy link
Contributor

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR follows @JongilHan66 suggestion and aims to reduce the large convective reflectivity caused by saSAS adjustment in the first timestep during a warm start. The issue is likely related to the inconsistency when DA updates the moisture at t but not the moisture from the previous timestep (t-36s). The moisture from the previous timestep is needed for initializing sigmab (updraft area fraction) when calculating qadv (q advection or tendency term).

The PR forces qadv to zero in the first timestep when a namelist parameter sigmab_coldstart is set to .true. It also reduces the lower limit of sigmab from 0.01 to 0.0 in the first timestep.

The PR also resolves an issue found by @ericaligo-NOAA where inline post does not correctly output soil temperature and moisture in grib2 when using RUC LSM. We tracked this to post_fv3 where iSF_SURFACE_PHYSICS is not updated for RUC LSM. This parameter is used in UPP SURFACE.f for soil variables output. The initial iSF_SURFACE_PHYSICS = 2 in post_fv3 and should be updated to 3 when RUC LSM is used. In this PR we set iSF_SURFACE_PHYSICS =3 when nsoil=9 and successfully output the 9 level soil variables.

Commit Message:

* UFSWM - 
  * AQM - 
  * CDEPS - 
  * CICE - 
  * CMEPS - 
  * CMakeModules - 
  * FV3 - https://github.com/NOAA-EMC/fv3atm/pull/869
    * ccpp-physics - https://github.com/ufs-community/ccpp-physics/pull/225
    * atmos_cubed_sphere - 
  * GOCART - 
  * HYCOM - 
  * MOM6 - 
  * NOAHMP - 
  * WW3 - 
  * fire_behavior
  * stochastic_physics - 

Priority:

  • Critical Bugfix: Reason
  • High: Reason
  • Normal

Git Tracking

UFSWM:

  • Closes #
  • None

Sub component Pull Requests:

  • AQM:
  • CDEPS:
  • CICE:
  • CMEPS:
  • CMakeModules:
  • FV3:
    • ccpp-physics:
    • atmos_cubed_sphere:
  • GOCART:
  • HYCOM:
  • MOM6:
  • NOAHMP:
  • WW3:
  • fire_behavior:
  • stochastic_physics:
  • None

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • New input data.
  • Updated input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@jkbk2004
Copy link
Collaborator

@JiliDong-NOAA @MatthewPyle-NOAA @rhaesung floating point exception happens with /rap_clm_lake_debug_intel.

146: forrtl: error (75): floating point exception
146: Image              PC                Routine            Line        Source
146: fv3.exe            00000000174CEE3B  Unknown               Unknown  Unknown
146: libpthread-2.28.s  00001473E3BB4D10  Unknown               Unknown  Unknown
146: fv3.exe            0000000016DDD533  set_lvlsxml_              130  SET_LVLSXML.f
146: fv3.exe            0000000015DD27F7  set_outflds_              122  SET_OUTFLDS.f
146: fv3.exe            0000000001DA85DF  post_fv3_mp_post_         187  post_fv3.F90
146: fv3.exe            0000000001D13FEB  module_wrt_grid_c        2123  module_wrt_grid_comp.F90

experiment path on hera: /scratch1/NCEPDEV/stmp2/Jong.Kim/FV3_RT/rt_2987226/rap_clm_lake_debug_intel/err

@JiliDong-NOAA
Copy link
Contributor Author

The crash seems to be related to inline post. I will take a look. @jkbk2004 @MatthewPyle-NOAA @rhaesung

@JiliDong-NOAA
Copy link
Contributor Author

The crash seems to be related to inline post. I will take a look. @jkbk2004 @MatthewPyle-NOAA @rhaesung

@jkbk2004 The failure is likely caused by not updating FV3. Now it has been updated. Let me know if you still see problems.

On a different note, I have issues on compiling production/RRFS.v1 on Hera, which worked well before. This is what I did:

$ git clone https://github.com/ufs-community/ufs-weather-model.git
$ cd ufs-weather-model/
$ git checkout production/RRFS.v1
$ git submodule update --init --recursive
$ module use modulefiles
$ module load ufs_hera.intel
$ setenv CMAKE_FLAGS "-DAPP=ATM -DCCPP_SUITES=FV3_RAP_clm_lake -D32BIT=ON -DDEBUG=ON"
$ ./build.sh > & compile.log &

The compiling error is:

/scratch1/NCEPDEV/fv3-cam/Jili.Dong/git_work/debug/ufs-weather-model/FV3/atmos_cubed_sphere/tools/fv_io.F90(98): error #6580: Name in only-list does not exist or is not accessible.   [MPP_COMM_NULL]
                                     mpp_npes, MPP_COMM_NULL
-----------------------------------------------^
/scratch1/NCEPDEV/fv3-cam/Jili.Dong/git_work/debug/ufs-weather-model/FV3/atmos_cubed_sphere/tools/fv_io.F90(106): error #6580: Name in only-list does not exist or is not accessible.   [MPP_GET_DOMAIN_TILE_COMMID]
                                     mpp_get_global_domain, mpp_get_domain_tile_commid, mpp_define_io_domain
------------------------------------------------------------^
/scratch1/NCEPDEV/fv3-cam/Jili.Dong/git_work/debug/ufs-weather-model/FV3/atmos_cubed_sphere/tools/fv_io.F90(483): error #6460: This is not a component name that is defined in the encompassing structure.   [USE_COLLECTIVE]
    Atm(1)%Fv_restart_tile%use_collective = .true.
---------------------------^
...
compilation aborted for /scratch1/NCEPDEV/fv3-cam/Jili.Dong/git_work/debug/ufs-weather-model/FV3/atmos_cubed_sphere/tools/fv_io.F90 (code 1)
make[2]: *** [FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/build.make:452: FV3/atmos_cubed_sphere/CMakeFiles/fv3.dir/tools/fv_io.F90.o] Error 1

Running RT shows the same compiling error.
BTW, compiling develop is successful. Any thoughts or suggestions?

@jkbk2004
Copy link
Collaborator

@JiliDong-NOAA I checked production/RRFS.v1 branch. The production branch is running ok.

@JiliDong-NOAA
Copy link
Contributor Author

@JiliDong-NOAA I checked production/RRFS.v1 branch. The production branch is running ok.

@jkbk2004 please let me know if you still see crash after the update on this PR. Also, could you let me know how you compile production/RRFS.v1 on Hera?

@grantfirl
Copy link
Collaborator

@JiliDong-NOAA I checked production/RRFS.v1 branch. The production branch is running ok.

@jkbk2004 please let me know if you still see crash after the update on this PR. Also, could you let me know how you compile production/RRFS.v1 on Hera?

@JiliDong-NOAA I can try to reproduce the problem on Hera if you'd like. The last that I knew, the regression tests for the RRFSv1 branch use rt.conf_rrfs, so when running rt.sh, specify -l rt.conf_rrfs for only running those tests that are necessary on Hera. When this goes into the develop branch, the full rt.conf needs to be run, however. I have access to Hera and can try to compile/run this PR branch there if you need help. Let me know.

@JiliDong-NOAA
Copy link
Contributor Author

@grantfirl it will be great if you can try compiling or running production/RRFS.v1 on Hera. I will also give another try by explicitly compiling or running RT after Hera is back from maintenance. Thanks.

@grantfirl
Copy link
Collaborator

@JiliDong-NOAA Hera maintenance got done early. I kicked off the rt.conf_rrfs RTs on Hera and will report back tomorrow since I'm about to fly home from AMS.

@grantfirl
Copy link
Collaborator

@JiliDong-NOAA I just realized that I misread the comments, that you were having trouble with the TARGET branch compiling on Hera. The tests that I kicked off were with your PR branch. I can try the production/RRFS.v1 branch when it's finished.

@grantfirl
Copy link
Collaborator

@JiliDong-NOAA @jkbk2004 I've attached my RT log for rt.conf_rrfs below. There were no issues compiling or running. The only failures are changed baselines. @JiliDong-NOAA Could you please verify that the failing tests are what you would expect given the changes to saSAS?

RegressionTests_hera.log

@grantfirl
Copy link
Collaborator

grantfirl commented Jan 15, 2025

@JiliDong-NOAA @jkbk2004 I can reproduce the compilation error on Hera for the production/rrfs.v1 branch checked out recursively when running rt.sh with -n rap_clm_lake_debug intel. Note that this test is not part of rt.conf_rrfs, which contains the subset of tests that have been used for code management purposes for the production/rrfs.v1 branch.

@JiliDong-NOAA
Copy link
Contributor Author

@grantfirl thanks for confirming that. After some digging, it turns out the compiling failure is related to the option

"-DENABLE_PARALLELRESTART"

it appears the above option is set to True or Yes by default. When compiling explicitly or use "rt.sh -n" without setting it to NO, the compiling will fail. most of rt.conf has "-DENABLE_PARALLELRESTART=NO". I guess that's why there is no problem running it with original rt confs.

After that settled, I can run RTs with this PR on Hera. I will look at the failed RTs you posted and get back to you and @jkbk2004 on that.

@grantfirl
Copy link
Collaborator

@JiliDong-NOAA Ah yes. That makes sense. Please see the discussion related tot this issue here: #2529

and here: NOAA-EMC/fv3atm#896

@JiliDong-NOAA
Copy link
Contributor Author

@jkbk2004 @grantfirl The failed RTs looks reasonable. Here are some notes:

  1. None of the RRFS RTs have applied saSAS. The baseline changes are unrelated to the saSAS fix in this PR.
  2. The baseline changes are from inline post fix for RUC LSM. The changes on sfc.nc is on global attribute "nsoil", which is corrected from 4 to 9 for RUC.
  3. Other changes are on grib2 files. The PR is to fix the RUC soil output in grib2 with the correct 9 soil layers. However, the RUC soil layer in RRFS RT remains incorrect in grib2 due to soil layers being not correctly set in the plain post text file (postxconfig-NT.txt). In other words, the soil layers are different from the baseline but still not correct.
  4. To get the correct soil layers for RUC LSM, the plain text post file needs to be updated with the correct soil layers. This requires correcting the related post xml file and recompiling.

Hope this helps with merging this PR.

@JiliDong-NOAA
Copy link
Contributor Author

@jkbk2004 @grantfirl is there anything we can do to move this PR forward? Thanks

@jkbk2004
Copy link
Collaborator

@jkbk2004 @grantfirl is there anything we can do to move this PR forward? Thanks

@JiliDong-NOAA Let me test a bit today.

@grantfirl
Copy link
Collaborator

grantfirl commented Jan 23, 2025

@jkbk2004 @grantfirl is there anything we can do to move this PR forward? Thanks

I don't think that there is anything left to do, IMO. The failures outside of the rt.conf_rrfs are documented elsewhere (see #2529) and can be fixed in that context. I suppose that it might behoove us to merge the parallelrestart machine confinement back to production/rrfs.v1 once #2529 is updated and merged since it causes errors on some platforms and isn't really needed outside of WCOSS at the moment.

@jkbk2004
Copy link
Collaborator

@MatthewPyle-NOAA I created new baseline and regression test ok on hera and hercules. This pr can be merged with the test on wcoss2.

@MatthewPyle-NOAA
Copy link
Collaborator

MatthewPyle-NOAA commented Jan 24, 2025

Thanks @jkbk2004 I've started working on the WCOSS tests.

@JiliDong-NOAA
Copy link
Contributor Author

FV3 updated to @81e6d10 and .gitmodules reverted

@jkbk2004
Copy link
Collaborator

@MatthewPyle-NOAA this pr can be merged with your approval.

@jkbk2004
Copy link
Collaborator

@MatthewPyle-NOAA I am merging this pr.

@jkbk2004 jkbk2004 merged commit 970fa85 into ufs-community:production/RRFS.v1 Jan 28, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants