Show how to distribute the super simple simulation #4

glwagner · 2025-01-24T12:39:55Z

To distribute a simulation:

Change the architecture from GPU() (or CPU) to Distributed(GPU()) or Distributed(GPU(), partition=partition) to manually configure the way the simulation is partitioned across GPUs.
Right now we also need to change one aspect of the simulation (this may be eliminated in the future): we need to fix the substeps of the free surface solver with free_surface = SplitExplicitFreeSurface(substeps=N). N=30 may work for the current configuration (the number of substeps depends on both the resolution and the time-step of the simulation).

luraess · 2025-01-25T10:47:54Z

So you mostly distribute in (x, y) plane. May it also happen in some cases along the vertical dim or not?

glwagner · 2025-01-25T14:02:01Z

So you mostly distribute in (x, y) plane. May it also happen in some cases along the vertical dim or not?

No, we don't distribute in the vertical. There are two reasons. One is that our algorithm involves several vertical integrals plus a tridiagonal solve that would increase communication in the vertical relative to the same horizontal simulation. The other is that the simulations usually have a thin aspect ratio, as thin as 1:50 for a high resolution simulation, so "horizontal slabs" have a lot of surface area.

luraess · 2025-01-27T13:58:23Z

It runs on 4 GH200 (single node on ALPS) - the CUDA / KA part:

[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (1.157 minutes)
[ Info:     ... simulation initialization complete (1.162 minutes)
[ Info:     ... simulation initialization complete (1.167 minutes)
[ Info:     ... simulation initialization complete (1.168 minutes)
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (20.379 seconds).
[ Info:     ... initial time step complete (20.380 seconds).
[ Info:     ... initial time step complete (20.380 seconds).
[ Info:     ... initial time step complete (20.380 seconds).
[ Info: Simulation is stopping after running for 1.508 minutes.
[ Info: Simulation is stopping after running for 1.509 minutes.
[ Info: Simulation is stopping after running for 1.509 minutes.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Simulation is stopping after running for 1.508 minutes.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.

Is there a way to have output @info printed only by rank 0?

glwagner · 2025-01-27T15:45:38Z

you can pass verbose=false to the Simulation constructor to suppress output completely. Then you can add a custom output that checks rank 0. It does make sense to build this into Oceananigans though, @simone-silvestri

simone-silvestri · 2025-01-27T16:06:36Z

We have some macro in ClimaOcean that can be useful, you would have to depend on climaocean though.
For example:

using ClimaOcean: @root
@root @info .....

does the trick

glwagner · 2025-01-27T19:10:52Z

We have some macro in ClimaOcean that can be useful, you would have to depend on climaocean though. For example:
using ClimaOcean: @root
@root @info .....
does the trick

I think this shows the macro is useful at a lower level. An independent package, or in Oceananigans otherwise?

Show how to distribute the super simple simulation

1da4e77

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show how to distribute the super simple simulation #4

Show how to distribute the super simple simulation #4

glwagner commented Jan 24, 2025

luraess commented Jan 25, 2025

glwagner commented Jan 25, 2025

luraess commented Jan 27, 2025 •

edited

Loading

glwagner commented Jan 27, 2025

simone-silvestri commented Jan 27, 2025

glwagner commented Jan 27, 2025

Show how to distribute the super simple simulation #4

Are you sure you want to change the base?

Show how to distribute the super simple simulation #4

Conversation

glwagner commented Jan 24, 2025

luraess commented Jan 25, 2025

glwagner commented Jan 25, 2025

luraess commented Jan 27, 2025 • edited Loading

glwagner commented Jan 27, 2025

simone-silvestri commented Jan 27, 2025

glwagner commented Jan 27, 2025

luraess commented Jan 27, 2025 •

edited

Loading