xcube-stac
is a Python package and
xcube plugin that adds a
data store
named stac
to xcube. The data store is used to access data from the
STAC - SpatioTemporal Asset Catalogs.
Installing xcube-stac directly from the git repository, clone the repository,
direct into xcube-stac
, and follow the steps below:
conda env create -f environment.yml
conda activate xcube-stac
pip install .
This installs all the dependencies of xcube-stac
into a fresh conda
environment, then installs xcube-stac into this environment from the
repository.
A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog, collection, and item. Each item can contain multiple assets, each linked to a data source. Items are associated with a timestamp or temporal range and a bounding box describing the spatial extent of the data.
Items within a collection generally exhibit similarities. For example, a STAC catalog might contain multiple collections corresponding to different space-borne instruments. Each item represents a measurement covering a specific spatial area at a particular timestamp. For a multi-spectral instrument, different bands can be stored as separate assets.
A STAC catalog can comply with the STAC API - Item Search conformance class, enabling server-side searches for items based on specific parameters. If this compliance is not met, only client-side searches are possible, which can be slow for large STAC catalogs.
The xcube-stac plugin reads the data sources from the STAC catalog and opens the data in an analysis ready form following the xcube dataset convetion. By default, a data ID represents one item, which is opened as a dataset, with each asset becoming a data variable within the dataset.
Additionally, a stack mode is available, enabling the stacking of items using the core functionality of xcube. This allows for mosaicking multiple tiles grouped by solar day, and concatenating the datacube along the temporal axis.
Also, odc-stac and
stackstac has been
considered during the evaluation of python libraries supporting stacking of STAC items.
However, both stacking libraries depend on GDAL driver for reading the data with
rasterio.open
, which prohibit the reading the data from the
CDSE S3 endpoint, due to
blocking of the rasterio AWS environments.
Comparing odc-stac and
stackstac,
the benchmarking report shows
that ocd-stac outperforms stackstac. Furthermore, stackstac shows an
issue in making
use of the overview levels of COGs files. Still, stackstac shows high popularity in the
community and might be supported in the future.
The following Jupyter notebooks provide some examples:
example/notebooks/cdse_sentinel_2.ipynb
: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data using the CDSE STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb
: This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and mosaicking of multiple tiles measured on the same solar day.example/notebooks/geotiff_nonsearchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a non-searchable STAC catalog.example/notebooks/geotiff_searchable_catalog.ipynb
: This notebook shows an example how to load a GeoTIFF file from a searchable STAC catalog.example/notebooks/netcdf_searchable_catalog.ipynb
: This notebook shows an example how to load a NetCDF file from a searchable STAC catalog.example/notebooks/xcube_server_stac_s3.ipynb
: This notebook shows an example how to open data sources published by xcube server via the STAC API.
The xcube data store framework allows to easily access data in an analysis ready format, following the few lines of code below.
from xcube.core.store import new_data_store
store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1"
)
ds = store.open_data(
"collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
data_type="dataset"
)
The data ID "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A"
points to the
STAC item's JSON
and is specified by the segment of the URL that follows the catalog's URL. The
data_type
can be set to dataset
and mldataset
, which returns a xr.Dataset
and
a xcube multi-resoltuion dataset,
respectively. Note that in the above example, if data_type
is not assigned,
a xarray.Dataset
will be returned.
To use the stac-mode, initiate a stac store with the argument stack_mode=True
.
from xcube.core.store import new_data_store
store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
bbox=[506700, 5883400, 611416, 5984840],
time_range=["2020-07-15", "2020-08-01"],
crs="EPSG:32632",
spatial_res=20,
asset_names=["red", "green", "blue"],
apply_scaling=True,
)
In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To
get Sentinel-2 L2A data, we assign data_id
to "sentinel-2-l2a"
. The bounding box and
time range are assigned to define the temporal and spatial extent of the data cube. The
parameter crs
and spatial_res
are required as well and define the coordinate
reference system (CRS) and the spatial resolution respectively. Note, that the bounding
box and spatial resolution needs to be given in the respective CRS.
To run the unit test suite:
pytest
To analyze test coverage:
pytest --cov=xcube_stac
To produce an HTML coverage report:
pytest --cov-report html --cov=xcube_stac
The unit test suite uses pytest-recording
to mock STAC catalogs. During development an actual HTTP request is performed
to a STAC catalog and the responses are saved in cassettes/**.yaml
files.
During testing, only the cassettes/**.yaml
files are used without an actual
HTTP request. During development, to save the responses to cassettes/**.yaml
, run
pytest -v -s --record-mode new_episodes
Note that --record-mode new_episodes
overwrites all cassettes. If the user only
wants to write cassettes which are not saved already, --record-mode once
can be used.
pytest-recording supports all records modes given by VCR.py.
After recording the cassettes, testing can be then performed as usual.