This a work-in-progress guidebook on existing warp resampling / reprojection methods in Python, along with some memory and statistical wall-time profiling results.
This is a WIP guidebook. We are presenting early on in the development to guide discussions and future work. The output from different resampling methods has not yet been verified, important parameters (e.g., target no_data values) have not been set, and not all methods have been implemented. We encourage people to contribute by building on the notebooks in the examples
directory or participating in discussions on this repo or on the Pangeo discourse.
Resampling and reprojection (i.e., warp resampling) are essential steps for generating raster tiles for browser based visualization. Further, warp resampling is often one of the most time consuming and memory intensive portions of the tile generation process. The importance and complexity of this step motivates an exploration of different warp resampling options.
Compare memory and time performance for generating a zoom level 0 256 x 256 raster from one timestep and variable of the MUR SST dataset using the following approaches:
- osgeo.warp
- rasterio.warp.reproject
- rioxarray.reproject
- pyresample.resample_blocks
- xesmf.Regridder
- geoutils.Raster.reproject
- raster_tools.warp.reproject
- odc.geo.xr.xr_reproject
- xcube.resampling
- geowombat.config.update
Out-of-scope:
- xarray-regrid - only regrids within the same rectilinear coordinate system
- weatherbench2/regridding - only seems to regrid within the same rectilinear coordinate system
- dinosaur - only seems to regrid within the same rectilinear coordinate system
- pygmt.grdproject - web mercator not amongst supported projections
- verde - not used for raster -> raster resampling (only points -> raster)
These methods will be run on the full resolution dataset. Nearest neighbor interpolation will be used for the first comparison. For simplicity, the amount of time necessary to generate a resampled array and the maximum amount of heap memory allocated will be measured.
- Compare to results when using a 2x and 4x downsampled versions to better understand the time and memory complexity
- Compare to results when using a virtual dataset (e.g., VRT, Kerchunk reference file).
- Compare results when reading from a dataset stored locally versus in cloud object storage.
- Compare to results when using a cloud-optimized dataset (Zarr).
- Compare other resampling methods (e.g., bilinear, conservative).
- Compare with methods that don't rely on existing packages (e.g., Conservative regridding with Xarray, GeoPandas, and Sparse and KDTree wrappers).
The notebooks can be run on a JupyterHub environment using the docker image quay.io/developmentseed/warp-resample-profiling:latest
, which is created using repo2docker
using the Dockerfile contained within the binder
directory.
This work was made possible through support from NASA IMPACT. Numerous people have guided development, especially Aimee Barciauskas (@abarciauskas-bgse), Justus Magin (@keewis), and Michael Sumner (@mdsumner). The resources page contains references for source information. Quarto configuration based on Cloud Native Geospatial Formats Guide and the Tile Benchmarking(https://developmentseed.org/tile-benchmarking/). All mistakes my own.