Place all figures at the bottom of summary notebook

developmentseed · Nov 1, 2024 · 5b9a487 · 5b9a487
1 parent 5a78d05
commit 5b9a487
Show file tree

Hide file tree

Showing 2 changed files with 53 additions and 25 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -125,6 +125,6 @@ format:
     code-overflow: wrap
     css: styles.css
     toc: true
-    toc-depth: 3
+    toc-depth: 4
 filters:
   - quarto
diff --git a/examples/summarize-results.ipynb b/examples/summarize-results.ipynb
@@ -23,6 +23,49 @@
     "- Rasterio was not included as a resampling method for GPM IMERG due to a lack of simple methods for handling the non-standard axis order (e.g., (time, x, y) instead of (time, y, x)). Non-standard data and metadata would likely be a barrier to use for many NetCDF files."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Variability related to I/O\n",
+    "\n",
+    "The figures below show the resampling duration and peak memory allocation for data stored as NetCDF and accessed through the H5NetCDF library, data stored as NetCDF but virtualized into Zarr and accessed via the Zarr and Icechunk libraries, and data transformed to Zarr and accessed via the Zarr and Icechunk libraries. Here are some key takeaways:\n",
+    "\n",
+    "- Virtualizing the data as Zarr gives a >2x performance improvement relative to loading with the H5NetCDF library.\n",
+    "- If the chunk sizes remain the same, virtualization gives the same performance benefit as conversion to a cloud-optimized data format like Zarr. Differences would be observed if the chunk configuration and size is optimized for the particular workflow."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Variability related to web-optimization\n",
+    "\n",
+    "The figures below show the resampling duration and peak memory allocation for tile generation from cloud-optimized GeoTIFF relative to virtualized NetCDF and \"web-optimized Zarr\" (WOZ), which in this case are Zarr data spoofed to contain overviews. Here are some key takeaways:\n",
+    "\n",
+    "- Overviews dramatically improve the performance of tile generation at all zoom levels. For example, tile generation was ~20x faster at zoom level 0 and ~3x faster at zoom level 6.\n",
+    "- Resampling from WOZ using rioxarray added overhead relative to resampling from Web-Optimized Zarr using rasterio, due to the increased import and object instantiation times in Xarray relative to using Zarr, Numpy, and Rasterio alone. While the performance differences between COG and WOZ resampling with rasterio could likely be eliminated with future development, rasterio will likely always be raster than rioxarray when using overviews."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Implications for future development\n",
+    "\n",
+    "- Virtualizing archival file formats greatly improves performance relative to archival file readers such as h5netcdf and motivates the generation of virtual references whenever possible.\n",
+    "- The Web-Optimized Zarr example shows the potential for Zarr overviews to enable highly performant visualization and motivates the development of the GeoZarr and multi-scales Zarr specifications.\n",
+    "- Pyinstrument showed a significant fraction of the total time when resampling Web-Optimized Zarr using rioxarray went towards Xarray importing Pandas and guessing the chunk manager. Both of these components could be improved or removed through future development.\n",
+    "- The dramatic difference between using XESMF with and without pre-generated weights raises the question of whether similar relative performance improvements could be gained by pre-generating weights for reprojection with GDAL. Given that pyinstrument shows only ~1/4 of the time is spent on the actual resampling operation when using COGs, building specifications for web-optimizing Zarr (i.e., GeoZarr and multi-scales), virtualizing existing datasets, and reducing import times would likely be much simpler and more fruitful activities."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Summary figures"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -162,6 +205,13 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Summary figures for comparing resampling methods"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -594,12 +644,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Variability related to I/O\n",
-    "\n",
-    "The figures below show the resampling duration and peak memory allocation for data stored as NetCDF and accessed through the H5NetCDF library, data stored as NetCDF but virtualized into Zarr and accessed via the Zarr and Icechunk libraries, and data transformed to Zarr and accessed via the Zarr and Icechunk libraries. Here are some key takeaways:\n",
-    "\n",
-    "- Virtualizing the data as Zarr gives a >2x performance improvement relative to loading with the H5NetCDF library.\n",
-    "- If the chunk sizes remain the same, virtualization gives the same performance benefit as conversion to a cloud-optimized data format like Zarr. Differences would be observed if the chunk configuration and size is optimized for the particular workflow."
+    "#### Summary figures for comparing storage formats and I/O libraries"
    ]
   },
   {
@@ -810,12 +855,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Variability related to web-optimization\n",
-    "\n",
-    "The figures below show the resampling duration and peak memory allocation for tile generation from cloud-optimized GeoTIFF relative to virtualized NetCDF and \"web-optimized Zarr\" (WOZ), which in this case are Zarr data spoofed to contain overviews. Here are some key takeaways:\n",
-    "\n",
-    "- Overviews dramatically improve the performance of tile generation at all zoom levels. For example, tile generation was ~20x faster at zoom level 0 and ~3x faster at zoom level 6.\n",
-    "- Resampling from WOZ using rioxarray added overhead relative to resampling from Web-Optimized Zarr using rasterio, due to the increased import and object instantiation times in Xarray relative to using Zarr, Numpy, and Rasterio alone. While the performance differences between COG and WOZ resampling with rasterio could likely be eliminated with future development, rasterio will likely always be raster than rioxarray when using overviews."
+    "#### Summary figures for exploring web-optimization"
    ]
   },
   {
@@ -1021,18 +1061,6 @@
    "source": [
     "plot_memory_by_weboptimization()"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Implications for future development\n",
-    "\n",
-    "- Virtualizing archival file formats greatly improves performance relative to archival file readers such as h5netcdf and motivates the generation of virtual references whenever possible.\n",
-    "- The Web-Optimized Zarr example shows the potential for Zarr overviews to enable highly performant visualization and motivates the development of the GeoZarr and multi-scales Zarr specifications.\n",
-    "- Pyinstrument showed a significant fraction of the total time when resampling Web-Optimized Zarr using rioxarray went towards Xarray importing Pandas and guessing the chunk manager. Both of these components could be improved or removed through future development.\n",
-    "- The dramatic difference between using XESMF with and without pre-generated weights raises the question of whether similar relative performance improvements could be gained by pre-generating weights for reprojection with GDAL. Given that pyinstrument shows only ~1/4 of the time is spent on the actual resampling operation when using COGs, building specifications for web-optimizing Zarr (i.e., GeoZarr and multi-scales), virtualizing existing datasets, and reducing import times would likely be much simpler and more fruitful activities."
-   ]
   }
  ],
  "metadata": {