Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts for asQ manuscript #205

Merged
merged 9 commits into from
Sep 30, 2024
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ asQ is designed to allow fast prototyping of new ParaDiag methods, while still b
This is achieved using the Firedrake and PETSc libraries.
The finite element models are defined by specifying the weak form using [Firedrake, "*an automated system for the portable solution of partial differential equations using the finite element method*](https://www.firedrakeproject.org/)", and the linear and nonlinear solvers required are provided by [PETSc, "*the Portable, Extensible Toolkit for Scientific Computation*"](https://petsc.org/release/).

See the arXiv paper https://arxiv.org/abs/2409.18792 for a description and demonstration of asQ. The code used for this paper can be found in the [asq_manuscript_examples](https://github.com/firedrakeproject/asQ/tree/master/asq_manuscript_examples) directory.

## ParaDiag

ParaDiag is a parallel-in-time method, meaning that is solves for multiple timesteps of a timeseries simultaneously, rather than one at a time like traditional serial-in-time methods.
ParaDiag is a parallel-in-time method, meaning that it solves for multiple timesteps of a timeseries simultaneously, rather than one at a time like traditional serial-in-time methods.
This [review article](https://arxiv.org/abs/2005.09158) provides a good introduction to the method.
asQ implements the ParaDiag-II family of methods based on creating a block-circulant approximation to the all-at-once system which can be block-diagonalised with the FFT and solved efficiently in parallel.

Expand All @@ -22,10 +23,9 @@ To install asQ, pass the arguments `--install asQ` to the `firedrake-install` sc

## Getting started

The best place to start is the [examples directory](https://github.com/firedrakeproject/asQ/tree/master/examples).
Annotated scripts for the linear advection equation and the heat equation show how to set up a problem with asQ and solve the timeseries using ParaDiag.
The best place to start is the arXiv paper and associated examples in [this directory](https://github.com/firedrakeproject/asQ/tree/master/asq_manuscript_examples).

More advanced scripts can be found in the [case studies directory](https://github.com/firedrakeproject/asQ/tree/master/case_studies), including scripts for the shallow water equations and a model for stratigraphic evolution of the sea floor.
Other examples can be found in the [examples directory](https://github.com/firedrakeproject/asQ/tree/master/examples) and more advanced scripts can be found in the [case studies directory](https://github.com/firedrakeproject/asQ/tree/master/case_studies), including scripts for the shallow water equations and a model for stratigraphic evolution of the sea floor.

## Help and support

Expand Down
44 changes: 44 additions & 0 deletions asq_manuscript_examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Example scripts for "asQ: parallel-in-time finite element simulations using ParaDiag for geoscientific models and beyond"

These are the python scripts used to generate the data for the asQ library paper https://arxiv.org/abs/2409.18792.

## Section 3.2 "Heat equation example" script
`heat.py` is the example in Section 3.2 "A heat equation example".
It explicitly uses the `AllAtOnce*` objects to create each component of the all-at-once system.
Because it is hard-coded to four ensemble ranks (i.e. four MPI ranks in time), it must be run with a multiple of 4 MPI ranks, e.g.

```mpiexec -np 4 heat.py```

will run with 4 ranks in time, and serial in space, whereas

```mpiexec -np 8 heat.py```

will run with 4 ranks in time, and 2 ranks in each spatial communicator.
To change the time-parallelism, change the `time_partition` list in the script.

## Section 4 "Numerical Examples" scripts

All scripts use `argparse` to process command line argument, so will print out information on how to use them if run with the `-h` flag. They do not have to be run in parallel to do this.

`python <script_name.py> -h`

All scripts will also accept a `--show_args` argument, which will print out the value of all argparse arguments at the beginning of the script.
The default arguments for the `*_paradiag.py` scripts do not use time-parallelism, so can be run in serial.
To specify the time-parallelism, see the help for the `--nslices` and `--slice_length` command line arguments.

- The data in Section 4.1 "Advection equation" was generated with
- `advection_serial.py` for the serial-in-time results.
- `advection_paradiag.py` for the parallel-in-time results.
- The data in Section 4.2 "Linear shallow water equations" was generated with
- `linear_shallow_water_serial.py` for the serial-in-time results.
- `linear_shallow_water_paradiag.py` for the parallel-in-time results.
- The data in Section 4.3 "Nonlinear shallow water equations" was generated with
- `nonlinear_shallow_water_serial.py` for the serial-in-time results.
- `nonlinear_shallow_water_paradiag.py` for the parallel-in-time results.
- The data in Section 4.4 "Compressible Euler equations" was generated with
- `vertical_slice_serial.py` for the serial-in-time results.
- `vertical_slice_paradiag.py` for the parallel-in-time results.

The `*_serial.py` scripts all use the `SerialMiniApp` class to run the serial-in-time method.
The parallel-in-time shallow water equation scripts use the `ShallowWaterMiniApp` to set up the all-at-once system specifically fo the shallow water equations.
The parallel-in-time advection and vertical slice scripts use the `Paradiag` class to construct the all-at-once system without having to manually create each `AllAtOnce*` object.
253 changes: 253 additions & 0 deletions asq_manuscript_examples/advection_paradiag.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
from math import pi, cos, sin
from utils.timing import SolverTimer
import firedrake as fd
from firedrake.petsc import PETSc
import asQ
from argparse import ArgumentParser
from argparse_formatter import DefaultsAndRawTextFormatter

parser = ArgumentParser(
description='ParaDiag timestepping for scalar advection of a Gaussian bump in a periodic square with DG in space and implicit-theta in time.',
epilog="""\
Optional PETSc command line arguments:

-circulant_alpha :float: The circulant parameter to use in the preconditioner. Default 1e-4.
-ksp_rtol :float: The relative residual drop required for convergence. Default 1e-11.
See https://petsc.org/release/manualpages/KSP/KSPSetTolerances/
-ksp_type :str: The Krylov method to use for the all-at-once iterations. Default 'richardson'.
Alternatives include gmres or fgmres.
See https://petsc.org/release/manualpages/KSP/KSPSetType/
""",
formatter_class=DefaultsAndRawTextFormatter
)
parser.add_argument('--nx', type=int, default=16, help='Number of cells along each side of the square.')
parser.add_argument('--cfl', type=float, default=0.8, help='Convective CFL number.')
parser.add_argument('--angle', type=float, default=pi/6, help='Angle of the convective velocity to the horizontal.')
parser.add_argument('--degree', type=int, default=1, help='Degree of the scalar space.')
parser.add_argument('--theta', type=float, default=0.5, help='Parameter for the implicit theta timestepping method.')
parser.add_argument('--width', type=float, default=0.2, help='Width of the Gaussian bump.')
parser.add_argument('--nwindows', type=int, default=1, help='Number of time-windows to solve.')
parser.add_argument('--nslices', type=int, default=1, help='Number of time-slices in the all-at-once system. Must divide the number of MPI ranks exactly.')
parser.add_argument('--slice_length', type=int, default=4, help='Number of timesteps per time-slice. Total number of timesteps in the all-at-once system is nslices*slice_length.')
parser.add_argument('--metrics_dir', type=str, default='metrics/advection', help='Directory to save paradiag output metrics to.')
parser.add_argument('--show_args', action='store_true', help='Print all the arguments when the script starts.')

args = parser.parse_known_args()
args = args[0]

if args.show_args:
PETSc.Sys.Print(args)

# The time partition describes how many timesteps
# are included on each time-slice of the ensemble

time_partition = tuple(args.slice_length for _ in range(args.nslices))
window_length = sum(time_partition)
nsteps = args.nwindows*window_length

# Calculate the timestep size dt from the CFL number
umax = 1.
dx = 1./args.nx
dt = args.cfl*dx/umax

# The Ensemble with the spatial and time communicators
ensemble = asQ.create_ensemble(time_partition)

# # # === --- domain --- === # # #

# The mesh needs to be created with the spatial communicator
mesh = fd.PeriodicUnitSquareMesh(args.nx, args.nx, quadrilateral=True, comm=ensemble.comm)

# We use a discontinuous Galerkin space for the advected scalar
V = fd.FunctionSpace(mesh, "DQ", args.degree)

# # # === --- initial conditions --- === # # #

x, y = fd.SpatialCoordinate(mesh)


# The scalar initial condition is a Gaussian bump centred at (0.5, 0.5)
def radius(x, y):
return fd.sqrt(pow(x-0.5, 2) + pow(y-0.5, 2))


def gaussian(x, y):
return fd.exp(-0.5*pow(radius(x, y)/args.width, 2))


q0 = fd.Function(V, name="scalar_initial")
q0.interpolate(1 + gaussian(x, y))

# The advecting velocity field is constant and directed at an angle to the x-axis
u = fd.Constant(fd.as_vector((umax*cos(args.angle), umax*sin(args.angle))))

# # # === --- finite element forms --- === # # #


# The time-derivative mass form for the scalar advection equation.
# asQ assumes that the mass form is linear so here
# q is a TrialFunction and phi is a TestFunction
def form_mass(q, phi):
return phi*q*fd.dx


# The DG advection form for the scalar advection equation.
# asQ assumes that the function form may be nonlinear so
# here q is a Function and phi is a TestFunction
def form_function(q, phi, t):
# upwind switch
n = fd.FacetNormal(mesh)
un = fd.Constant(0.5)*(fd.dot(u, n) + abs(fd.dot(u, n)))

# integration over element volume
int_cell = q*fd.div(phi*u)*fd.dx

# integration over internal facets
int_facet = (phi('+') - phi('-'))*(un('+')*q('+') - un('-')*q('-'))*fd.dS

return int_facet - int_cell


# # # === --- PETSc solver parameters --- === # # #

# The PETSc solver parameters used to solve the
# blocks in step (b) of inverting the ParaDiag matrix.
# MUMPS is a parallel direct solver so spatial parallelism can be used
block_parameters = {
'ksp_type': 'preonly',
'pc_type': 'lu',
'pc_factor_mat_solver_type': 'mumps',
}

# The PETSc solver parameters for solving the all-at-once system.
# The python preconditioner 'asQ.CirculantPC' applies the ParaDiag matrix.
#
# The equation is linear so we can either:
# a) Solve it using a preconditioned Krylov method:
# P^{-1}Au = P^{-1}b
# The solver option for this is:
# -ksp_type gmres
# b) Solve it with stationary iterations:
# Pu_{k+1} = (P - A)u_{k} + b
# The solver option for this is:
# -ksp_type richardson

paradiag_parameters = {
'snes_type': 'ksponly', # only solve 1 "Newton iteration" per window (i.e. a linear problem)
'ksp_type': 'richardson', # stationary iterations
'ksp': {
'monitor': None, # show the residual at every iteration
'converged_rate': None, # show the contraction rate once the linear solve has converged
'rtol': 1e-11, # relative residual tolerance
},
'pc_type': 'python',
'pc_python_type': 'asQ.CirculantPC', # the alpha-circulant preconditioner
'circulant_alpha': 1e-4, # set other values from command line using: -circulant_alpha <value>
'circulant_block': block_parameters, # options dictionary for the inner solve
'circulant_state': 'linear', # system is linear so don't update the preconditioner reference state
'aaos_jacobian_state': 'linear', # system is linear so don't update the jacobian reference state
}


# # # === --- Setup ParaDiag --- === # # #

# Give everything to the Paradiag object which will build the all-at-once system.
paradiag = asQ.Paradiag(ensemble=ensemble,
form_function=form_function,
form_mass=form_mass,
ics=q0, dt=dt, theta=args.theta,
time_partition=time_partition,
solver_parameters=paradiag_parameters)

# create a timer to profile the calculations
timer = SolverTimer()


# This function will be called before paradiag solves each time-window. We can use
# this to make the output a bit easier to read, and to time the window calculation
def window_preproc(paradiag, wndw, rhs):
PETSc.Sys.Print(f'### === --- Calculating time-window {wndw} --- === ###')
PETSc.Sys.Print('')
# for now we are interested in timing only the solve, this
# makes sure we don't time any synchronisation after prints.
with PETSc.Log.Event("window_preproc.Coll_Barrier"):
paradiag.ensemble.ensemble_comm.Barrier()
timer.start_timing()


# This function will be called after paradiag solves each time-window. We can use
# this to finish the window calculation timing and print the result.
def window_postproc(paradiag, wndw, rhs):
timer.stop_timing()
PETSc.Sys.Print('')
PETSc.Sys.Print(f'Window solution time: {round(timer.times[-1], 5)}')
PETSc.Sys.Print('')


# Setup all solver objects. The firedrake DM and options management
# makes it difficult to setup some preconditioners without actually
# calling `solve`, so we just run once to set everything up.
PETSc.Sys.Print('')
PETSc.Sys.Print('### === --- Setting up solver and prefactoring --- === ###')
PETSc.Sys.Print('')
with PETSc.Log.Event("warmup_solve"):
paradiag.solve(1)
PETSc.Sys.Print('')

# reset solution and iteration counts for timed solved
paradiag.reset_diagnostics()
aaofunc = paradiag.solver.aaofunc
aaofunc.bcast_field(-1, aaofunc.initial_condition)
aaofunc.assign(aaofunc.initial_condition)

PETSc.Sys.Print('### === --- Solving timeseries --- === ###')
PETSc.Sys.Print('')

# Solve nwindows of the all-at-once system
with PETSc.Log.Event("timed_solves"):
paradiag.solve(args.nwindows,
preproc=window_preproc,
postproc=window_postproc)

# # # === --- Solver diagnostics --- === # # #

PETSc.Sys.Print('### === --- Iteration and timing results --- === ###')
PETSc.Sys.Print('')

asQ.write_paradiag_metrics(paradiag, directory=args.metrics_dir)

nw = paradiag.total_windows
nt = paradiag.total_timesteps
PETSc.Sys.Print(f'Total windows: {nw}')
PETSc.Sys.Print(f'Total timesteps: {nt}')
PETSc.Sys.Print('')

# Show the parallel partition sizes.
PETSc.Sys.Print(f'Total DoFs per window: {V.dim()*window_length}')
PETSc.Sys.Print(f'DoFs per timestep: {V.dim()}')
PETSc.Sys.Print(f'Total number of MPI ranks: {ensemble.global_comm.size}')
PETSc.Sys.Print(f'Number of MPI ranks per timestep: {mesh.comm.size}')
PETSc.Sys.Print(f'DoFs/rank: {V.dim()/mesh.comm.size}')
PETSc.Sys.Print(f'Complex block DoFs/rank: {2*V.dim()/mesh.comm.size}')
PETSc.Sys.Print('')

# paradiag collects a few iteration counts for us
lits = paradiag.linear_iterations
nlits = paradiag.nonlinear_iterations
blits = paradiag.block_iterations.data()

# Number of nonlinear iterations will be 1 per window for linear problems
PETSc.Sys.Print(f'Nonlinear iterations: {str(nlits).rjust(5)} | Iterations per window: {str(nlits/nw).rjust(5)}')

# Number of linear iterations of the all-at-once system, total and per window.
PETSc.Sys.Print(f'Linear iterations: {str(lits).rjust(5)} | Iterations per window: {str(lits/nw).rjust(5)}')

# Number of iterations needed for each block in step-(b), total and per block solve
PETSc.Sys.Print(f'Total block linear iterations: {blits}')
PETSc.Sys.Print(f'Iterations per block solve: {blits/lits}')
PETSc.Sys.Print('')

# Timing measurements
PETSc.Sys.Print(timer.string(timesteps_per_solve=window_length,
total_iterations=paradiag.linear_iterations, ndigits=5))
PETSc.Sys.Print('')
Loading
Loading