Reading data, particularly data that will not be utilized in a computation, can incur substantial overhead, particularly if the data is spread over multiple files on a networked filesystem, where metadata queries can dominate the cost of IO.
yt
takes the approach of building a coarse-grained index based on the discretization method of the data (particle, grid, octree, unstructured mesh), combining this with datapoint-level indexing for selection processes.
To supplement this, methods in yt
that process data utilize a system of data "chunking," whereby segments of data identified during coarse-grained indexing are subdivided by one of a few different schemes and yielded to the iterating function; these schemes can include a limited number of tuning parameters or arguments.
These three chunking methods are all
, spatial
and io
.
The all
method simply returns a single, one-dimensional array, and the number of chunks is always exactly one; this enables both non-parallel algorithms and simple access to small datasets.
spatial
chunking yields three-dimensional arrays.
For grid-based datasets, these are the grids, while for particle and octree datasets they are leaf-by-leaf collections of particles or mesh values.
Optionally, the spatial
chunking method can return "ghost zones" around regions, for computation of stencils.
The final type of chunking, io
, is designed to iterate over sets of data in a manner that is most conducive to pipelined IO.
These will not always be load-balanced in size of the returned chunks, however.
In some cases, io
chunking may return one file at a time (in the case of spreading items across many different files), while in others it may be returning sub-components of a single file.
This chunking type is the most common strategy for parallel-decomposition.
Necessarily, both indexing and selection methods must be implemented to expose these different chunking interfaces; yt
utilizes specific methods for each of the primary data types that it can access.
We detail these below, specifically describing how they are implemented and how they can be improved in future iterations.
yt
was originally written to support the Enzo code, which is a patch-based Adaptive Mesh Refinement (AMR) simulation platform.
In Figure @fig:grid_organization the grid structure of one of the standard yt
example datasets, IsolatedGalaxy
, can be seen.
Analysis of grid-based data is the most frequent application of yt
.
While we discuss much of the techniques implemented for datasets consisting of multiple, potentially overlapping grids, yt
also supports single-grid datasets (such as FITS cubes) and is able to decompose them for parallel analysis.
yt
also supports other grid patch codes, listed in the section on frontends.
yt
supports several different "features" of patch-based codes.
These include grids that span multiple parent objects, grids that overlap with coarser data (i.e., AMR), grids that overlap with other grids that provide the same level of resolution of data (i.e., grids at the same AMR level), refinement factors that vary based on level, and edge, and vertex-centered data.
For the cases of overlapping grids (either on the same or higher refinement levels) masks are generated that indicate which data is considered authoritative.
As noted in Data Objects, the process of selecting points is multi-step, starting at coarse selection that may be at the file level, and proceeding to selection of specific data points that are included in a selector.
For grid-based data, the coarse selection stage proceeds in an extremely simple fashion, by iterating over flat arrays of left and right grid edges and creating a bitmap of the selected grids.
Because this method -- while not taking advantage of any data structures of even mild sophistication -- is able to take advantage of pipelining and cache-optimization, we have found that it is sufficiently performant in most geometries up to approximately
Indexing grid data in yt
is optimized for systems of grids that tend to have larger grid patches, rather than smaller; specifically, in yt
each grid patch consists of a Python object, which adds a bit of overhead.
In the limit of many more cells than grid objects, this overhead is small, but in cases where the number of grids is
To address both the memory overhead and the Python overhead, as well as more generally address potential scalability issues with grid selection, several tentative explorations have been made into an implementation of a more sophisticated "grid visitors" indexing and selection method, drawing on the approach used by the oct-visitors (described below in Section #sec:octree_analysis).
These were an attempt to unify the selection methods between octrees and grids, to reduce the overall code duplication and implementation overhead.
Each process -- selection, copying of data, generation of coordinates -- is represented by an instance of a GridVisitor
object.
A spatial tree is constructed, wherein parent/child relationships are established between grids.
The tree is recursively traversed, and for all selected points the object is called.
This allows grids, their relationships, and the data masks to be stored in structures and forms that are both optimized and compressed.
This method is essential for scaling to a large number of grid patches; the storage requirements of a single grid patch Python object are around 1K per object (about one gigabyte per million grids), whereas the optimized storage reduces this to approximately 140 bytes (about one gigabyte per eight million grids), with further reductions possible; for selection operations, we are also able to reduce the number of temporary arrays and utilize compressed mask representation, bringing peak memory usage down further.
The spatial-tree optimization substantially increases performance for "wide and shallow" dataset selection.
However, while such an implementation may be possible, the previous attempts were stymied by performance and maintenance considerations for the grid code, in particular related to the masking of "child" zones in an efficient and straightforward manner.
yt
supports octree-based AMR datasets (primarily RAMSES and ART, but also the output from the octree-based radiative transfer code Hyperion).
yt
stores a copy of the octree using a pointer-based approach, where each oct points to its eight children (if refined).
The octs living at the coarsest level of the simulation are stored as a uniform grid. For domain-decomposed datasets, each domain is represented as a sparse octree, where the root octs are stored as a list and efficiently accessed using a binary search, ensuring each root oct is found in file_ind
) and its in-memory location (domain_ind
), the index of the domain it belongs to (domain
) and a list of pointers to its children (up to eight in 3D). This requires at most 88 bytes per oct.
In order to load data within a given region, a two-step approach is followed.
First, the cells within the region of interest, as described in Data Objects are selected. yt
relies on an oct-visitor machinery combined with selection routines.
The tree is recursively traversed depth-first starting from the root grid, following only those branches that may intersect with the selected region.
At the tip of each branch, the up-to-eight leaf cells are visited.
In a first pass, the number of selected cells is computed and in a second pass, the on-file location of their parent oct is stored.
Second, yt
relies on the on-file location obtained from the octree traversal to lazily read data from disk.
This ensures that only the minimal amount of data is being read and is particularly efficient when accessing a region spanning a small number of domains and/or a small number of refinement levels.
Recently, yt
has been extended to fully support accessing neighboring cells.
This is achieved by computing one-cell thick quantities around each oct, which emulates the "ghost zones" found in patch-based codes. This approach has the advantage of abstracting the octree structure and provides a common interface to create derived fields, as described in @sec:derived_fields.
The 56 neighbors (yt
, that trade computational time with memory, for example by storing the tree as a fully-threaded structure (i.e. store pointers to the 6 neighbors sharing a face with each oct), or by starting at a central oct and searching "upwards and outwards."
Smoothed Particle Hydrodynamics (SPH) is a commonly-used method for solving equations of hydrodynamics in astrophysics (as well as many other fields!) from a lagrangian perspective.
This provides many advantages over grid-based discretizations, but also poses somewhat different challenges for analysis and visualization.
While a full description of SPH is outside the scope of this paper, there are a handful of crucial and important pieces of information that we will review.
For more information, we refer interested readers to this comprehensive review of Smoothed Particle Hydrodynamics by Daniel Price [@doi:10.1016/j.jcp.2010.12.011], the one by Volker Springel [@doi:10.1146/annurev-astro-081309-130914], or to the SPLASH method paper by Daniel Price [@doi:10.1071/AS07022].
SPH defines field quantities at a set of moving points, allowing generic field values (i.e., between the points) to be computed by integrating over all the elements in the domain using a special-purpose kernel; this method is an exact interpolation between the discretized points by applying the smoothing kernel.
Formally, this is represented as:
$$
A(\mathbf{r}) = \int A(\mathbf{r}')W(|\mathbf{r} - \mathbf{r}'|,h)\mathrm{d}V(\mathbf{r}')
$$
This is then reduced to a sum over the particles (the discretization points):
$$
A(\mathbf{r}, h) = \sum_j V_j A_j W(|\mathbf{r} - \mathbf{r}j|, h)
$$
In these equations, $A$ is the field, W is the weighting function (the 'kernel') and $h$ is the smoothing length.
This weighting function typically takes the shape of a Gaussian, approximated through various spline functions (truncated at some radius), but that is not strictly necessary [@doi:10.1111/j.1365-2966.2012.21439.x].
The quantity $h$, the smoothing length, has previously been referred to as the 'half-smoothing length' (hence the variable name hsml
used in many contexts), but formally represents the full-width-half-maximum of the Gaussian approximated by the spline kernels, with the ratio $\ell{\rm FWHM} / h = \sqrt{2\ln2}$ [@doi:10.1093/mnras/stab1423].
The value of
In general, there are two approaches to defining the finite set of particles that contribute to a field at a given point. In "scatter" methods, computing a field at a given point is conducted by iterating over particles and identifying those whose smoothing length overlaps with a given point. In the alternate method "gather," the outer and inner loops of the algorithm are essentially swapped; for every point at which a field value is sought, the local smoothing length is computed and then all particles within that smoothing length are used in the computation.
For the purposes of post-processing analysis and visualization, the most important criteria for applying SPH to a set of particles are to ensure that the function that computes field values at a given location is identical to that used inside the simulation code (or as close as possible) and that the calculations are conducted in as short a time as possible. This set of dual requirements has led to yt implementing a flexible system for defining the smoothing kernel used, whether or not a normalization step is applied to SPH quantities, and the option to use either "scatter" or "gather" methods for computing field values at fixed locations.
Previous versions of yt
provided analysis of SPH data through a hybrid approach that mixed pure-SPH analysis with octree-based gridding and indexing that used particle density as a guide for the necessary resolution.
Although the present, yt 4.0 series does not utilize octrees for particles, a description of the previous implementation is useful to provide both historical information and modern motivation for the "demeshening" initiative that led to the current code base.
In practice, what this meant was that when a dataset was loaded, the particles positions were converted into one dimensional Morton index values, sorted, and by using a process of identifying the longest prefix in their bitwise representations, an octree (that mapped directly onto the Morton index values) was constructed.
This octree was controlled by two parameters: n_ref
, which specified the number of particles in an octree leaf node necessary to refine that node into eight sub-octants, and over_refine_factor
, which specified the number of cells that each leaf node represented.
For instance, if n_ref
was set to 64 (the default), any octree node containing 64 particles would be refined into eight child nodes; if over_refine_factor
was set to N, each leaf node would consist of a set of zones that were over_refine_factor
produced eight mesh elements total).
Constructing these octrees using morton indices, if the entire set of particles could be stored in memory simultaneously, was extremely efficient.
To do so, the particles merely needed to be converted into a morton index via fast, bit-level operations, those index values sorted, and then processed in order to identify the greatest common bit-prefix.
Because two successive particles with identical index values would share an octree location, looking for sequences of identical prefix values (i.e., lower-level octree colocation) naturally produces an octree.
When fluid quantities such as density were requested in the yt 3.0 series, the values were computed on the mesh defined by the octree; increasing the over_refine_factor
and decreasing the n_ref
would serve to increase the resolution.
While this produced mostly-acceptable visualizations, and in particular produced dynamically-resolved visualizations, it posed several problems for both visualization and analysis.
The first, and arguably the most important, is that the strict locality requirements for refinement produced artifacts at leaf node boundaries.
This resulted in incorrect and unphysical visualizations of hydrodynamic quantities, affecting most obviously those regions at the edges of clusters of gas particles.
These were mitigated in regions of highly-clustered gas particles, but visual artifacts were still clear, as yt
was applying a visualization suited for finite volume elements to Lagrangian particles.
With the 4.0 series, yt
no longer utilizes octrees for analyzing, meshing or visualizing SPH data.
Indexing, for the purposes of fast, memory-efficient access to subsets of the data, is provided by a bitmap index using the Morton indices of the particles, as described in Bitmap Indexing.
For the purposes of visualization, any quantity requiring smoothing over nearest neighbors is computed on-demand at each pixel in the output image; this provides much higher resolution than the previous method, which was both subject to free parameters and required the construction of a 3D fluid field that was then collapsed to 2D for visualization.
In many cases, this is also considerably more performant, as constructing a full-domain 3D fluid field is avoided, thus reducing both memory requirements and the number of floating point calculations.
Development of this new method was referred to internally as "the demeshening," as it served to eliminate the global (octree) mesh.
In order to facilitate the massive, type and dimensionality-specific spatial queries necessary for performing millions of queries as efficiently as possible, and with as little overhead as possible, yt
packages a kD-tree written in Cython that can be called from either Cython or Python, and which provides low-level APIs for querying from within tight loops.
Whereas previously, constructing a projection or a slice would slice through an octree mesh and provide the results from that variable resolution mesh, the current version of yt
's SPH machinery will instead construct a pixel plane and smooth the appropriately identified particles onto that pixel plane.
This produces much higher-fidelity results (see Figure {@fig:demesh-comparison}), but a current limitation is that whenever the pixel plane is changed, the particles must be re-deposited; this puts it at odds with the similar machinery for octree and patch-based datasets, which provide a "read-once-pixelize-many" approach.
The octree method -- while not incapable of utilizing different normalization and particle search methods -- was less flexible than the current, de-meshened approach. For instance, the method of SPH particle identification (i.e., so-called "scatter" or "gather" methods for correlating particles with positions) is now flexible and able to be set at runtime. The normalization (if used) can take into account global quantities, local quantities, and is flexible based on the field being smoothed.
Some additional differences between SPH analysis and the analysis of finite volume data are present when utilizing data selectors. For instance, 3D data selectors as applied to finite volume codes only select those cells whose centers fall within the data selector. 2D and 1D data selectors (such as slices and rays) also include those finite volume cells that the selector passes through. However, with SPH data, the selection methods in 2D and 3D will always include those particles whose spheres of influence, defined by the appropriate smoothing lengths, are within or overlapping with the data selector. This is somewhat counter to the expectations set by the grid codes, but aligns with the need to have a fully self-contained data-container for computing field values. For instance, this means that a "ray" object (often used to compute, for instance, the column density in a cosmological simulation) will in fact include a set of particles within a (potentially) varying impact parameter. This can be seen in diagram form in Figure @fig:sph_ray_tracing. We note that, as described in the SPLASH method paper [@doi:10.1071/AS07022], the kernel interpolation can be computed using the (dimensionless) ratio between the impact parameter of the ray and the smoothing length of the particle.
<script type="text/paperscript" canvas="sph-ray-figure" src="images/sph_ray_tracing.js"></script>{#fig:sph_ray_tracing width="1px"}
Other than these differences, which have been intentionally made to align the results with the expected results from the underlying discretization method, the APIs for access to particle data and finite volume data are identical, and they provide broadly identical functionality, where the disparities are typically in functionality such as volume rendering. This allows a single analysis script, or package (such as Trident), to utilize a high-level API to address both gridded and Lagrangian data while still ensuring that the results are high-fidelity and representative of the underlying methods.
yt
has support for several different types of unstructured mesh elements.
Typically, these are supplied as a set of coordinate points (vertices) and connectivity between those vertices.
yt
is able to interpret three types of elements (and their 2D counterparts): tetrahedral elements (4 faces, 4 vertices), wedge elements (5 faces, 6 vertices) and hexahedral elements (6 faces, 8 vertices).
These vertices can serve as control points, where values are defined at those locations; in finite element simulations, there can be additional control points for higher-order solutions.
(For a deeper investigation of the way finite elements are defined and how this corresponds to real-space coordinates, we suggest starting with the periodic table of the finite elements which provides both visual reference and a set of citations for further exploration; further explanation can be found in the SIAM News Article describing the table.)
Similar to how yt
manages data access for particle and finite volume datasets, for unstructured mesh datasets yt
identifies each element collection as a chunk.
This means that for situations where you have multiple meshes, composed of individual elements, each will represent its own chunk as well as its own mesh object.
For example, in MOOSE-based simulations with multiple connectivity arrays, each will be a different "field type" -- typically named connect1
, connect2
, etc.
These are then joined (similar to how {@sec:particle_unions} are defined) into collections that include all of the elements of different types.
A few items are of particular note in the implementation of finite element mesh analysis in yt
.
The first is that yt
supports direct, native higher-order finite element visualization.
Visualization of unstructured meshes, and finite element frameworks, utilizes its own set of custom pixelization routines that are dependent not only on the element type but the order of the calculation.
The second item that is of relevance is that yt
is able to apply "displacement" vectors to the elements; these displacement vectors can vary with time, and thus element position and shape can vary over the course of a simulation.
By providing appropriate arguments, yt
can scale these displacement vectors (either with scalars or vector values) to exaggerate or distort their application, and in addition a vector offset can be applied to the vertices in the dataset.
Scaling and offsetting are both applied on a per-mesh basis, enabling individual collections of elements to be scaled individually.
One of the most important optimizations that has yet to be applied to the unstructured mesh support in yt
is in the "coarse" indexing process of selection.
While fine-grained indexing and selection is applied, the process of checking which meshes (i.e., coarse chunks) may intersect a given selector currently passes everything through to the next stage; this is highly-inefficient, and an important target for future optimization.
The pixelization routines in yt
for unstructured mesh elements rely on computing ElementMapper
.
All ElementMapper
subclasses need to provide two functions, one to transform a "physical" position
Type | # Dims | # Vertices | Description |
---|---|---|---|
P1 |
1 | 2 | Linear |
P1 |
2 | 4 | Linear Triangular |
Q1 |
2 | 4 | Linear Quadrilateral |
T2 |
2 | 6 | Quadratic Triangular |
Q2 |
2 | 9 | Quadratic Quadrilateral |
P1 |
3 | 8 | Linear Tetrahedral |
Q1 |
3 | 8 | Linear Hexahedral |
W1 |
3 | 6 | Linear Wedge |
Tet2 |
3 | 10 | Quadratic Tetrahedral |
S2 |
3 | 20 | Quadratic Hexahedral |
Table: Finite element types supported in yt
. {#tbl:finite_element_types}
To conduct pixelization of a slice or to compute values for volume rendering, yt
first computes bounding boxes for the individual values.
Once a pixel has been identified as being "within" a particular element (which also takes into account the shape of higher-order elements, rather than assuming a flat set of planes) the pixelizer has to compute the value at that location.
In order to compute intra-element values at a position sample_at_real_point
, which is defined for each ElementMapper
.
{#fig:finite_element_higher_order}
Of particular note is that, as listed in Table @tbl:finite_element_types , yt
has support for higher-order element types.
In Figure @fig:finite_element_higher_order, an example of this is displayed.
On the left of the figure is a slice plot through a 2nd-order tetrahedral mesh.
On the right, we have zoomed in on the edge of the boundary of the element mesh.
In both, the mesh elements have been outlined in black.
As is clearly visible in the second plot, yt
is applying higher-order methods for computing pixel values; not only through non-linear interpolation of field values, but also in the shape of the elements, which extend outside the linear boundaries of the tetrahedral elements.
In Section @sec:abstraction-coordinates, we describe the relationship between the internal 'index' space that yt
uses for referencing values and the natural 'data' space that the values represent.
The abstraction of the coordinate systems and the relationship between index-space and data-space provides the ability to convert between the two; however, constructing visualizations and annotations requires an additional level of complexity.
The single most important shortcoming in the analysis of non-cartesian datasets in yt
is that the data selection operators almost exclusively function on the coordinates in index space, rather than in data space.
As such, subselecting datasets by utilizing traditional geometric selectors in yt
is much less useful than it should be; for example, selecting a sphere (see @sec:dobj-sphere) applies spherical selections in index space, which result in a decidedly non-spherical object.
Selections of objects such as @sec:dobj-region do make considerably more sense, however, as they are often thought of as sweeping data along coordinate axes; the region object itself will naturally select wedges in a spherical domain, for instance.
Future versions of yt
will likely introduce means of more clearly selecting objects in coordinate space, for more natural subsetting of data.
It is still possible to apply data selection based on field values, which can include the coordinate-space field coordinates (such as
Despite these weak spots, however, yt
does provide a number of routines that are specific to non-cartesian datasets, including pixelizers for cylindrical and spherical coordinate systems.
(See @sec:pixelization for more detail on this process.)
Pixelizers that take variable-resolution data along the yt
also provides access to Cartopy [@doi:10.5281/zenodo.7430317] for more advanced or featureful map projections, as well as overlaying continents and other geographic shapes.
In Figure @fig:spherical_data we demonstrate some of the native, built-in functionality yt
provides for non-cartesian data.
This dataset, a simulation of magnetically-driven winds in a protoplanetary disk, was conducted in spherical coordinates.
Here, we have used yt
's functionality for overplotting streamlines as well as line integral convolution on irregular meshes to display the data in its native resolution and as a slice along the azimuthal axis of the simulation domain.