Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fullchem run overflows #2686

Closed
Xinying331 opened this issue Jan 16, 2025 · 7 comments
Closed

Fullchem run overflows #2686

Xinying331 opened this issue Jan 16, 2025 · 7 comments
Assignees
Labels
category: Bug Something isn't working topic: GCHP Related to GCHP only topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error)

Comments

@Xinying331
Copy link

Your name

Xyw

Your affiliation

Rutgers University

What happened? What did you expect to happen?

The GCHP full-chem run crashed during the execution of GCHPchem in ucx_mod.F90.

What are the steps to reproduce the bug?

I am running a global 4x5 fullchem simulation using GCHP v14.5.0 on the Pleiades platform with GEOS-FP input data. To debug the issue, I built GCHP with debug flags enabled. The error message indicates that the problem occurs in the ucx_mod.F90 code.

Here are the steps I have taken so far:

1)I referred to issue #243 and #526 and updated the Gfortran compiler to version 12.3.0, but this did not resolve the problem.
2) I verified the restart file for v14.5.0 from AWS and checked the meteorology files (e.g., do a test run just for the month of July 2019) to ensure they are valid.
3) I printed the values of variables that might be causing the overflow error in ucx_mod.F90 based on the run log.
image
From ucx_mod.F90 the error occurs at:
image

After print out related variables I noticed a drastic drop in H2O and HNO3 concentrations, which might be causing the overflow. (You may get more information about it from run.log
image

I am unsure whether the issue is related to the ESMF version or another configuration on Pleiades. I would greatly appreciate any advice you could provide. Thank you so much! @yantosca @lizziel @msulprizio

Please attach any relevant configuration and log files.

allPEs.log
ExtData.rc.txt
HEMCO_Config.rc.txt
run.log.txt

What GEOS-Chem version were you using?

14.5.0

What environment were you running GEOS-Chem on?

Other (please explain below)

What compiler and version were you using?

Compilers: GNU 12.3.0; ESMF-8.3.1

Will you be addressing this bug yourself?

No

In what configuration were you running GEOS-Chem?

GCHP

What simulation were you running?

Full chemistry

As what resolution were you running GEOS-Chem?

C24

What meterology fields did you use?

GEOS-FP

Additional information

No response

@Xinying331 Xinying331 added the category: Bug Something isn't working label Jan 16, 2025
@Xinying331
Copy link
Author

my apologize, From ucx_mod.F90 the error occurs at line 1868:
image

@yantosca yantosca added topic: GCHP Related to GCHP only topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error) labels Jan 16, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 16, 2025

Hi @Xinying331, it looks like there was an error in the log file earlier than the overflow that might shed light on the problem. It is here:

     GCHPctmEnv: INFO: Configured to expect 'bottom-up' meteorological data from 'ExtData'
     GCHPctmEnv: INFO: Configured to use dry air pressure in advection
     GCHPctmEnv: INFO: Configured to correct native mass flux (if using) for humidity
 Real*4 Resource Parameter: GCHPchem_DT:1200.000000
 Integer*4 Resource Parameter: GCHPchem_REFERENCE_TIME:1000
pe=00111 FAIL at line=08412    MAPL_Generic.F90                         <status=41>
pe=00111 FAIL at line=08334    MAPL_Generic.F90                         <status=41>
pe=00097 FAIL at line=08412    MAPL_Generic.F90                         <status=41>
pe=00097 FAIL at line=08334    MAPL_Generic.F90                         <status=41>
pe=00098 FAIL at line=08412    MAPL_Generic.F90                         <status=41>

The run kept going, but I wonder if there was a problem reading a file that ultimately caused a problem later on. Would you be able to go to the relevant lines in MAPL and add some prints to see what it was looking for? You can search for the file at the top of the code directory with find . -name MAPL_Generic.F90.

A few tips for the debug:

  1. In your logging.yaml file you can change level: DEBUG to level: WARNING. A previous iteration of our docs said to set both level and root_level to DEBUG but this was incorrect. You only need to set root_level. This will greatly cut down on the file length of allPEs.log.
  2. Run with a minimal number of cores for the debug runs.

@Xinying331
Copy link
Author

Hi Lizziel,

Thank you for getting back to me. However, Pleiades is currently undergoing its annual maintenance. I will update this request once Pleiades is back online.

Thanks!
Xinying

@Xinying331
Copy link
Author

Hi, Below is the section of the code where the issue occurred (lines 8411-8417):
Image

I've also attched the the run.log and allPEs.log files.

allPEs.log.txt
run-EDGARv50-20250205_1228.log.txt

Thank you!

@lizziel
Copy link
Contributor

lizziel commented Feb 6, 2025

Thanks @Xinying331, the error is here:

 label = GCHPchem_INTERNAL_RESTART_FILE:
 val before call = 
                                                                                
                                                                                
                                                                                
                    
 status before call =           41

What value does configuration file GCHP.rc have for GCHPchem_INTERNAL_RESTART_FILE:. It seems to be empty. If it is the gchp restart symbolic link, check if that link actually was set. Does it point to anything?

@Xinying331
Copy link
Author

Thank you @lizziel , The setting for GCHPchem_INTERNAL_RESTART_FILE in GCHP.rc was somehow missing, even though the gchp_restart symbolic link was correctly set with a valid file. I manually fixed the GCHP.rc, and the model is now running successfully! Do you have any idea why the GCHPchem_INTERNAL_RESTART_FILE entry in GCHP.rc was empty? Thanks again!

Image

@lizziel
Copy link
Contributor

lizziel commented Feb 18, 2025

Hmm, I have never seen that before. If it happens again definitely let us know! I am glad it is working now. I will close out this issue.

@lizziel lizziel closed this as completed Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: GCHP Related to GCHP only topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error)
Projects
None yet
Development

No branches or pull requests

3 participants