t8code - modular adaptive mesh refinement in the exascale era |
10 June 2024 |
In this paper, we present our scalable dynamic adaptive mesh refinement (AMR)
library t8code
, which was officially released in 2022 [@Holke_t8code_2022].
is written in C/C++, open source, and readily available at
www.dlr-amr.github.io/t8code. It is developed
and maintained at the Institute for Software Technology
of the German Aerospace Center (DLR). The software library provides fast and memory
efficient parallel algorithms for dynamic AMR to handle tasks such as mesh
adaptation, load-balancing, ghost computation, feature search and more.
can manage meshes with over one trillion mesh elements
[@holke_optimized_2021] and scales up to one million parallel processes
[@holke_scalable_2018]. It is intended to be used as mesh management backend in
scientific and engineering simulation codes paving the way towards
high-performance applications of the upcoming exascale era.
Adaptive Mesh Refinement has been established as a successful approach
for scientific and engineering simulations over the past decades
[@TEUNISSEN2019106866; @10.1145/1268776.1268779; @doi:10.1137/0733054;
@doi:10.1137/0715049]. By modifying the mesh resolution locally according to
problem specific indicators, the computational power is efficiently
concentrated where needed and the overall memory usage is reduced by orders of
magnitude. However, managing adaptive meshes and associated data is a very
challenging task, especially for parallel codes. Implementing fast and scalable
AMR routines generally leads to a large development overhead motivating the
need for external mesh management libraries like t8code
Currently, t8code
's AMR routines support a wide range of element types:
vertices, lines, quadrilaterals, triangles, hexahedra, tetrahedra, prisms, and
pyramids. Additionally, implementation of other refinement patterns and element
shapes is possible.
See \autoref{fig:visploremesh} for an examplary adapted mesh managed by t8code
for visualizing
earth mantle convection data.
is based on the forest-of-trees approach. Starting point
for the usage of t8code
is an unstructured input mesh, which
we denote a coarse mesh. This coarse mesh describes the geometry of the
computational domain. Each of the coarse mesh cells is then viewed as the
root of a refinement tree. These trees are refined recursively in a structured
pattern, resulting in a collection of trees, which we call a forest. t8code
stores only a minimal amount of information about the finest elements of the mesh -
the leaves of the trees - in order to reconstruct the whole forest.
By enumerating the leaves in a recursive refinement pattern we obtain a space-filling curve (SFC) logic. Via these SFCs, all elements in a refinement tree are assigned an integer-based index and are stored in linear order. Element coordinates or element neighbors do not need to be stored explicitly but can be reconstructed from the SFC index. Fast bitwise SFC operations ensure optimal runtimes and diminish the need for memory lookups. Moreover, the SFC is used to distribute the forest mesh across multiple processes, so that each process only stores a unique portion of the SFC. See \autoref{fig:SpaceFillingCurves}.
While being successfully applied to quadrilateral
and hexahedral meshes [@burstedde_p4est_2011; @weinzierl_peano_2019],
these SFC techniques are extended by t8code
in a modular fashion, such that arbitrary
element shapes are supported. We achieve this modularity through a novel
decoupling approach that separates high-level (mesh global) algorithms from
low-level (element local) implementations. All high-level algorithms can
be applied to different implementations of element shapes and refinement
patterns. A mix of different element shapes in the same mesh is also
supports distributed coarse meshes of arbitrary size and complexity,
which we tested for up to 370 million coarse mesh cells
[@burstedde_coarse_2017]. Moreover, we conducted various performance studies
on the JUQUEEN and the JUWELS supercomputers at the Jülich Supercomputing
Center. t8code
's ghost and partition routines are exceptionally fast with
proper scaling of up to 1.1 trillion mesh elements; see
\autoref{tab:t8code_runtimes}, [@holke_optimized_2021]. Furthermore, in a
prototype code [@Dreyer2021] implementing a high-order discontinuous Galerkin
method (DG) for advection-diffusion equations on dynamically adaptive
hexahedral meshes we obverve a 12 times speed-up compared to non-AMR meshes
with only an overall 15% runtime contribution of t8code
; see
+----------------+-------------------+--------------------+--------+-----------+ | # Process | # Elements | # Elem. / process | Ghost | Partition | +:==============:+:=================:+:==================:+:======:+:=========:+ | 49,152 | 1,099,511,627,776 | 22,369,621 | 2.08 s | 0.73 s | +----------------+-------------------+--------------------+--------+-----------+ | 98,304 | 1,099,511,627,776 | 11,184,811 | 1.43 s | 0.33 s | +================+===================+====================+========+===========+ | Table 1: Runtimes on JUQUEEN for the ghost layer and partitioning operations | | for a distributed mesh consisting of 1.1 trillion elements. | | \label{tab:t8code_runtimes} | +================+===================+====================+========+===========+
Even though t8code
is a newcomer to the market, it is already in use as the
mesh management backend in various research projects, most notably in the earth
system modeling (ESM) community. In the
ADAPTEX project t8code
is integrated
with the Trixi framework
[@schlottkelakemper2020trixi] - a modern computational fluid dynamics code
written in Julia. Over the next years several ESM
applications are planned to couple to this combination, including
MPTrac, and
SERGHEI. Moreover, t8code
also plays an important role in several DLR funded research projects, e.g.,
(massive data visualization), HYTAZER (hydrogen tank certification), Greenstars
(additive rocket engine manufacturing) and PADME-AM (simulation assisted
additive manufacturing).
For further information beyond this short note and also for code examples, we
refer to our
Documentation and
Wiki reachable via our homepage
dlr-amr.github.io/t8code and our technical
publications on t8code
[@holke_scalable_2018; @burstedde_coarse_2017;
@holke_optimized_2021; @burstedde_tetrahedral_2016; @Knapp20;
@Becker_hanging_faces; @elsweijer_curved_2021; @Dreyer2021;
@Lilikakis_removing; @Holke_t8code_2022; @Fussbroich_towards_2023].
Johannes Holke thanks the Bonn International School Graduate School of
Mathematics (BIGS) for funding the initial development of t8code
. Further
development work was funded by the German Research Foundation as part of
project 467255783, the European Union via NextGenerationEU and the German
Federal Ministry of Research and Education (BMBF) as part of the ADAPTEX and
PADME-AM projects. Development work was performed as part of the Helmholtz School
for Data Science in Life, Earth and Energy (HDS-LEE) and received funding from
the Helmholtz Association of German Research Centres. The development team of
thanks the Institute for Software Technology and the German Aerospace
Center (DLR).
The authors state that there are no conflicts of interest.