-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show how to distribute the super simple simulation #4
base: main
Are you sure you want to change the base?
Conversation
So you mostly distribute in (x, y) plane. May it also happen in some cases along the vertical dim or not? |
No, we don't distribute in the vertical. There are two reasons. One is that our algorithm involves several vertical integrals plus a tridiagonal solve that would increase communication in the vertical relative to the same horizontal simulation. The other is that the simulations usually have a thin aspect ratio, as thin as 1:50 for a high resolution simulation, so "horizontal slabs" have a lot of surface area. |
It runs on 4 GH200 (single node on ALPS) - the CUDA / KA part: [ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: MPI has not been initialized, so we are calling MPI.Init().
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info: Initializing simulation...
[ Info: ... simulation initialization complete (1.157 minutes)
[ Info: ... simulation initialization complete (1.162 minutes)
[ Info: ... simulation initialization complete (1.167 minutes)
[ Info: ... simulation initialization complete (1.168 minutes)
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info: Executing initial time step...
[ Info: ... initial time step complete (20.379 seconds).
[ Info: ... initial time step complete (20.380 seconds).
[ Info: ... initial time step complete (20.380 seconds).
[ Info: ... initial time step complete (20.380 seconds).
[ Info: Simulation is stopping after running for 1.508 minutes.
[ Info: Simulation is stopping after running for 1.509 minutes.
[ Info: Simulation is stopping after running for 1.509 minutes.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Model iteration 2 equals or exceeds stop iteration 2.
[ Info: Simulation is stopping after running for 1.508 minutes.
[ Info: Model iteration 2 equals or exceeds stop iteration 2. Is there a way to have output |
you can pass |
We have some macro in ClimaOcean that can be useful, you would have to depend on climaocean though. using ClimaOcean: @root
@root @info ..... does the trick |
I think this shows the macro is useful at a lower level. An independent package, or in Oceananigans otherwise? |
To distribute a simulation:
Change the architecture from
GPU()
(or CPU) toDistributed(GPU())
orDistributed(GPU(), partition=partition)
to manually configure the way the simulation is partitioned across GPUs.Right now we also need to change one aspect of the simulation (this may be eliminated in the future): we need to fix the substeps of the free surface solver with
free_surface = SplitExplicitFreeSurface(substeps=N)
.N=30
may work for the current configuration (the number of substeps depends on both the resolution and the time-step of the simulation).