Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOMAS 14.4.3 simulations failing without error message #2698

Open
reginaluu opened this issue Jan 23, 2025 · 4 comments
Open

TOMAS 14.4.3 simulations failing without error message #2698

reginaluu opened this issue Jan 23, 2025 · 4 comments
Labels
category: Debug Help Request for assistance debugging GEOS-Chem topic: Aerosols Related to aerosol species in GEOS-Chem topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error)

Comments

@reginaluu
Copy link

Your name

Regina Luu

Your affiliation

UCI

What happened? What did you expect to happen?

I created a new run directory to run a TOMAS simulation in version 14.4.3. There were no missing files indicated in the dry run however when I started the simulation it ran for 5 minutes and failed without an error message both in GC.log and HEMCO.log. GC.log ends shortly after initializing the thermodynamics and HEMCO.log stops in the middle of getting the emissions.

Since there is no error message I am unsure of how to best handle this issue and would like some help into what may be going wrong. Thank you!

Regina

What are the steps to reproduce the bug?

This error occurs every time I start the simulation.

Please attach any relevant configuration and log files.

GC.log
geoschem_config.txt
HEMCO.log
HEMCO_Config.txt
HISTORY.txt
log.dryrun.txt
summarize_build.txt

What GEOS-Chem version were you using?

14.4.3

What environment were you running GEOS-Chem on?

Local cluster

What compiler and version were you using?

gnu 11.3

Will you be addressing this bug yourself?

No

In what configuration were you running GEOS-Chem?

GCClassic

What simulation were you running?

Full chemistry

As what resolution were you running GEOS-Chem?

4x5

What meterology fields did you use?

MERRA-2

Additional information

No response

@reginaluu reginaluu added the category: Bug Something isn't working label Jan 23, 2025
@yantosca yantosca added topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error) category: Debug Help Request for assistance debugging GEOS-Chem topic: Aerosols Related to aerosol species in GEOS-Chem and removed category: Bug Something isn't working labels Jan 27, 2025
@yantosca
Copy link
Contributor

Thanks for writing @reginaluu and thanks for your patience, I was away on Friday.

Were you running on a cluster using a scheduler (e.g. SLURM, PBS, etc)? If so there may have been an error message written to a log file such as slurm-12345678.out (where 12345678 will be replaced the job id number).

It could be that the job exceeded the memory limit available to it in the computational queue. That could stop a run like that without much warning. In that case there should have been a message in the scheduler output log. If that's the case you could try to request more memory for the job.

We do have some debugging tips on ReadTheDocs. A few things to try:

  • Turn off operations (transport, chemistry, etc.) one by one until you find the operation that causes the run to fail.
  • Turn off can also try turning off diagnostic collections that you don't need
  • Only request the species you need in diagnostic output instead of asking for ?ADV? or ?ALL? species. (More diagnostics means more memory use)
  • Recompile with the debugging flags turned on (build with -DCMAKE_RELEASE_TYPE=Debug) and run a short simulation to see if that finds any errors.

Please keep us posted as to what you find.

@reginaluu
Copy link
Author

Hi @yantosca thank you for the help!

Indeed, I am running using a SLURM scheduler but the error message was as follows:
srun: error: c-13-41: task 0: Exited with exit code 11
srun: Terminating job step 8787822.0

I also tried some of the debugging solutions listed. The only lead I have been able to find is that turning off the chemistry operation allows the run to finish with outputs. I also tried only having 2 collections on and also recompiling with the debug flags but both of those attempts resulted in the same simulation failure with no descriptive error message.

The question that I have now is how do I further narrow down where the issue is within the chemistry operation?

@yantosca
Copy link
Contributor

Thanks @reginaluu. You can try turning on the verbose output in geoschem_config.yml. That should print a little message after each subroutine. I'm not sure how much output you'll get from within TOMAS though. That should at least give you some indication.

Also I wonder if you are coming up against a hard memory limit on your cluster. You could try to increase the amount of memory that you request in the SLURM run script.

@reginaluu
Copy link
Author

Thank you @yantosca! I do believe that I did have the verbose output turned on when I ran my initial simulation but I still am not seeing any indications of where the simulation might be going wrong. Would it be in the thermodynamics because that's where the GC.log ends?

Additionally I also tried to increase the amount of memory requested to twice as much and unfortunately there was no difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Debug Help Request for assistance debugging GEOS-Chem topic: Aerosols Related to aerosol species in GEOS-Chem topic: Runtime Error Related to runtime issues (e.g. simulation stopped w/ error)
Projects
None yet
Development

No branches or pull requests

2 participants