Fullchem run overflows #2686

Xinying331 · 2025-01-16T06:55:00Z

Your name

Xyw

Your affiliation

Rutgers University

What happened? What did you expect to happen?

The GCHP full-chem run crashed during the execution of GCHPchem in ucx_mod.F90.

What are the steps to reproduce the bug?

I am running a global 4x5 fullchem simulation using GCHP v14.5.0 on the Pleiades platform with GEOS-FP input data. To debug the issue, I built GCHP with debug flags enabled. The error message indicates that the problem occurs in the ucx_mod.F90 code.

Here are the steps I have taken so far:

1)I referred to issue #243 and #526 and updated the Gfortran compiler to version 12.3.0, but this did not resolve the problem.
2) I verified the restart file for v14.5.0 from AWS and checked the meteorology files (e.g., do a test run just for the month of July 2019) to ensure they are valid.
3) I printed the values of variables that might be causing the overflow error in ucx_mod.F90 based on the run log.

From ucx_mod.F90 the error occurs at:

After print out related variables I noticed a drastic drop in H2O and HNO3 concentrations, which might be causing the overflow. (You may get more information about it from run.log

I am unsure whether the issue is related to the ESMF version or another configuration on Pleiades. I would greatly appreciate any advice you could provide. Thank you so much! @yantosca @lizziel @msulprizio

Please attach any relevant configuration and log files.

allPEs.log
ExtData.rc.txt
HEMCO_Config.rc.txt
run.log.txt

What GEOS-Chem version were you using?

14.5.0

What environment were you running GEOS-Chem on?

Other (please explain below)

What compiler and version were you using?

Compilers: GNU 12.3.0; ESMF-8.3.1

Will you be addressing this bug yourself?

No

In what configuration were you running GEOS-Chem?

GCHP

What simulation were you running?

Full chemistry

As what resolution were you running GEOS-Chem?

C24

What meterology fields did you use?

GEOS-FP

Additional information

No response

Xinying331 · 2025-01-16T06:59:54Z

my apologize, From ucx_mod.F90 the error occurs at line 1868:

lizziel · 2025-01-16T21:38:26Z

Hi @Xinying331, it looks like there was an error in the log file earlier than the overflow that might shed light on the problem. It is here:

     GCHPctmEnv: INFO: Configured to expect 'bottom-up' meteorological data from 'ExtData'
     GCHPctmEnv: INFO: Configured to use dry air pressure in advection
     GCHPctmEnv: INFO: Configured to correct native mass flux (if using) for humidity
 Real*4 Resource Parameter: GCHPchem_DT:1200.000000
 Integer*4 Resource Parameter: GCHPchem_REFERENCE_TIME:1000
pe=00111 FAIL at line=08412    MAPL_Generic.F90                         <status=41>
pe=00111 FAIL at line=08334    MAPL_Generic.F90                         <status=41>
pe=00097 FAIL at line=08412    MAPL_Generic.F90                         <status=41>
pe=00097 FAIL at line=08334    MAPL_Generic.F90                         <status=41>
pe=00098 FAIL at line=08412    MAPL_Generic.F90                         <status=41>

The run kept going, but I wonder if there was a problem reading a file that ultimately caused a problem later on. Would you be able to go to the relevant lines in MAPL and add some prints to see what it was looking for? You can search for the file at the top of the code directory with find . -name MAPL_Generic.F90.

A few tips for the debug:

In your logging.yaml file you can change level: DEBUG to level: WARNING. A previous iteration of our docs said to set both level and root_level to DEBUG but this was incorrect. You only need to set root_level. This will greatly cut down on the file length of allPEs.log.
Run with a minimal number of cores for the debug runs.

Xinying331 · 2025-01-22T22:24:35Z

Hi Lizziel,

Thank you for getting back to me. However, Pleiades is currently undergoing its annual maintenance. I will update this request once Pleiades is back online.

Thanks!
Xinying

Xinying331 · 2025-02-05T21:52:42Z

Hi, Below is the section of the code where the issue occurred (lines 8411-8417):

I've also attched the the run.log and allPEs.log files.

allPEs.log.txt
run-EDGARv50-20250205_1228.log.txt

Thank you!

lizziel · 2025-02-06T15:50:50Z

Thanks @Xinying331, the error is here:

 label = GCHPchem_INTERNAL_RESTART_FILE:
 val before call = 
                                                                                
                                                                                
                                                                                
                    
 status before call =           41

What value does configuration file GCHP.rc have for GCHPchem_INTERNAL_RESTART_FILE:. It seems to be empty. If it is the gchp restart symbolic link, check if that link actually was set. Does it point to anything?

Xinying331 · 2025-02-13T19:27:07Z

Thank you @lizziel , The setting for GCHPchem_INTERNAL_RESTART_FILE in GCHP.rc was somehow missing, even though the gchp_restart symbolic link was correctly set with a valid file. I manually fixed the GCHP.rc, and the model is now running successfully! Do you have any idea why the GCHPchem_INTERNAL_RESTART_FILE entry in GCHP.rc was empty? Thanks again!

lizziel · 2025-02-18T19:31:41Z

Hmm, I have never seen that before. If it happens again definitely let us know! I am glad it is working now. I will close out this issue.

Xinying331 added the category: Bug Something isn't working label Jan 16, 2025

yantosca assigned lizziel Jan 16, 2025

yantosca added topic: GCHP Related to GCHP only topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error) labels Jan 16, 2025

lizziel closed this as completed Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fullchem run overflows #2686

Fullchem run overflows #2686

Xinying331 commented Jan 16, 2025

Xinying331 commented Jan 16, 2025

lizziel commented Jan 16, 2025

Xinying331 commented Jan 22, 2025

Xinying331 commented Feb 5, 2025

lizziel commented Feb 6, 2025

Xinying331 commented Feb 13, 2025

lizziel commented Feb 18, 2025

Fullchem run overflows #2686

Fullchem run overflows #2686

Comments

Xinying331 commented Jan 16, 2025

Your name

Your affiliation

What happened? What did you expect to happen?

What are the steps to reproduce the bug?

Please attach any relevant configuration and log files.

What GEOS-Chem version were you using?

What environment were you running GEOS-Chem on?

What compiler and version were you using?

Will you be addressing this bug yourself?

In what configuration were you running GEOS-Chem?

What simulation were you running?

As what resolution were you running GEOS-Chem?

What meterology fields did you use?

Additional information

Xinying331 commented Jan 16, 2025

lizziel commented Jan 16, 2025

Xinying331 commented Jan 22, 2025

Xinying331 commented Feb 5, 2025

lizziel commented Feb 6, 2025

Xinying331 commented Feb 13, 2025

lizziel commented Feb 18, 2025