Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement open_data and support tiff, netcdf and zarr for s3 and https protocol; Implement stack-mode which stacks items containing tiffs; #19

Merged
merged 24 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions .github/workflows/unittest-workflow.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,33 @@ jobs:
unittest:
runs-on: ubuntu-latest
steps:
- name: checkout xcube
uses: actions/checkout@v4
with:
repository: xcube-dev/xcube
path: xcube

- name: checkout xcube-stac
uses: actions/checkout@v4
with:
path: xcube-stac

- name: Set up MicroMamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: environment.yml
environment-file: xcube-stac/environment_workflow.yml

- name: Install xcube and start xcube server
shell: bash -l {0}
run: |
cd /home/runner/work/xcube-stac/xcube-stac/xcube
ls
pip install .
xcube serve -c examples/serve/demo/config.yml &

- name: Run unit tests
shell: bash -l {0}
run: |
cd /home/runner/work/xcube-stac/xcube-stac
cd /home/runner/work/xcube-stac/xcube-stac/xcube-stac
ls
pytest
13 changes: 13 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Code of Conduct

We are committed to providing a friendly, safe, and welcoming environment
for all, regardless of gender, sexual orientation, ability, ethnicity,
religion, or any other characteristic.

We expect everyone to treat each other with respect and kindness. We do not
tolerate harassment or discrimination of any kind.

If you witness or experience any behavior that violates this code of conduct,
please report it to the project maintainers immediately.

Thank you for helping us create a welcoming community!
42 changes: 42 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# How to contribute

The `xcube-stac` project welcomes contributions of any form
as long as you respect our [code of conduct](CODE_OF_CONDUCT.md) and stay
in line with the following instructions and guidelines.

If you have suggestions, ideas, feature requests, or if you have identified
a malfunction or error, then please
[post an issue](https://github.com/xcube-dev/xcube-stac/issues).

If you'd like to submit code or documentation changes, we ask you to provide a
pull request (PR)
[here](https://github.com/xcube-dev/xcube-stac/pulls).
For code and configuration changes, your PR must be linked to a
corresponding issue.

To ensure that your code contributions are consistent with our project’s
coding guidelines, please make sure all applicable items of the following
checklist are addressed in your PR.

**PR checklist**

* Format code using [black](https://black.readthedocs.io/) with default settings.
Check also section [code style](#code-style) below.
* Your change shall not break existing unit tests.
`pytest` must run without errors.
* Add unit tests for any new code not yet covered by tests.
* Make sure test coverage is close to 100% for any change.
Use `pytest --cov=xcube_stac --cov-report=html` to verify.

## Code style <a name="code-style"></a>

The `xcube-stac` code compliant to [PEP-8](https://pep8.org/) except for a line
length of 88 characters as recommended by [black](https://black.readthedocs.io/).
Since black is un-opinionated regarding the order of imports,
we use the following three import blocks separated by an empty
line:

1. Python standard library imports, e.g., `os`, `typing`, etc
2. 3rd-party imports, e.g., `xarray`, `zarr`, etc
3. Relative `xcube_stac` module imports using `.` prefix.

176 changes: 166 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,21 @@
named `stac` to xcube. The data store is used to access data from the
[STAC - SpatioTemporal Asset Catalogs](https://stacspec.org/en/).

## Setup

### Installing the xcube-stac plugin from the repository
## Table of contents
1. [Setup](#setup)
1. [Installing the xcube-stac plugin from the repository](#install_source)
2. [Overview](#overview)
1. [General structure of a STAC catalog](#stac_catalog)
2. [General functionality of xcube-stac](#func_xcube_stac)
3. [Introduction to xcube-stac](#intro_xcube_stac)
1. [Overview of Jupyter notebooks](#overview_notebooks)
2. [Getting started](#getting_started)
4. [Testing](#testing)
1. [Some notes on the strategy of unit-testing](#unittest_strategy)

## Setup <a name="setup"></a>

### Installing the xcube-stac plugin from the repository <a name="install_source"></a>

Installing xcube-stac directly from the git repository, clone the repository,
direct into `xcube-stac`, and follow the steps below:
Expand All @@ -26,15 +38,158 @@ This installs all the dependencies of `xcube-stac` into a fresh conda
environment, then installs xcube-stac into this environment from the
repository.

## Testing
## Overview <a name="overview"></a>

### General structure of a STAC catalog <a name="stac_catalog"></a>
A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog,
collection, and item. Each item can contain multiple assets, each linked to a data
source. Items are associated with a timestamp or temporal range and a bounding box
describing the spatial extent of the data.

Items within a collection generally exhibit
similarities. For example, a STAC catalog might contain multiple collections
corresponding to different space-borne instruments. Each item represents a measurement
covering a specific spatial area at a particular timestamp. For a multi-spectral
instrument, different bands can be stored as separate assets.

A STAC catalog can comply with the [STAC API - Item Search](https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search#stac-api---item-search)
conformance class, enabling server-side searches for items based on specific
parameters. If this compliance is not met, only client-side searches are possible,
which can be slow for large STAC catalogs.

### General functionality of xcube-stac <a name="func_xcube_stac"></a>
The xcube-stac plugin reads the data sources from the STAC catalog and opens the data
in an analysis ready form following the [xcube dataset convetion](https://xcube.readthedocs.io/en/latest/cubespec.html).
By default, a data ID represents one item, which is opened as a dataset, with each
asset becoming a data variable within the dataset.

Additionally, a stack mode is
available, enabling the stacking of items using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
This allows for mosaicking multiple tiles and concatenating the datacube along the
temporal axis.

Also, [stackstac](https://stackstac.readthedocs.io/en/latest/) has been
considered during the evaluation of python libraries supporting stacking of STAC items.
However, the [benchmarking report](https://benchmark-odc-stac-vs-stackstac.netlify.app/)
comparing stackstac and odc-stac shows that ocd-stac outperforms stackstac. Furthermore,
stackstac shows an [issue](https://github.com/gjoseph92/stackstac/issues/196) in making
use of the overview levels of COGs files. Still, stackstac shows high popularity in the
community and might be supported in the future.

## Introduction to xcube-stac <a name="intro_xcube_stac"></a>

### Overview of Jupyter notebooks <a name="overview_notebooks"></a>
The following Jupyter notebooks provide some examples:

* `example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb`:
This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data
from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and
mosaicking of multiple tiles measured on the same solar day.
* `example/notebooks/geotiff_nonsearchable_catalog.ipynb`:
This notebook shows an example how to load a GeoTIFF file from a non-searchable
STAC catalog.
* `example/notebooks/geotiff_searchable_catalog.ipynb`:
This notebook shows an example how to load a GeoTIFF file from a searchable
STAC catalog.
* `example/notebooks/netcdf_searchable_catalog.ipynb`:
This notebook shows an example how to load a NetCDF file from a searchable
STAC catalog.
* `example/notebooks/xcube_server_stac_s3.ipynb`:
This notebook shows an example how to open data sources published by xcube server
via the STAC API.

### Getting started <a name="getting_started"></a>

The xcube [data store framework](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework)
allows to easily access data in an analysis ready format, following the few lines of
code below.

```python
from xcube.core.store import new_data_store

store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1"
)
ds = store.open_data(
"collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
data_type="dataset"
)
```
The data ID `"collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A"` points to the
[STAC item's JSON](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md)
and is specified by the segment of the URL that follows the catalog's URL. The
`data_type` can be set to `dataset` and `mldataset`, which returns a `xr.Dataset` and
a [xcube multi-resoltuion dataset](https://xcube.readthedocs.io/en/latest/mldatasets.html),
respectively. Note that in the above example, if `data_type` is not assigned,
a multi-resolution dataset will be returned. This is because the item's asset links to
GeoTIFFs, which are opened as multi-resolution datasets by default.

To use the stac-mode, initiate a stac store with the argument `stack_mode=True`.

```python
from xcube.core.store import new_data_store

store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
)
```

In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To
get Sentinel-2 L2A data, we assign `data_id` to `"sentinel-2-l2a"`. The bounding box and
time range are assigned to define the temporal and spatial extent of the data cube.
Additionally, for this example, we need to set a query argument to select a specific
[Sentinel-2 processing baseline](https://sentiwiki.copernicus.eu/web/s2-processing#S2Processing-L2Aprocessingbaseline),
as the collection contains multiple items for the same tile with different processing
procedures. Note that this requirement can vary between collections and must be
specified by the user. To set query arguments, the STAC catalog needs to be conform with
the [query extension](https://github.com/stac-api-extensions/query).

The stacking is performed using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
All arguments of [odc.stac.load](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html)
can be passed into the `open_data(...)` method, which forwards them to the
`odc.stac.load` function.

To apply mosaicking, we need to assign `groupby="solar_day"`, as shown in the
[documentation of `odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html).
The following few lines of code show a small example including mosaicking.

```python
from xcube.core.store import new_data_store

store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
groupby="solar_day",
)
```

## Testing <a name="testing"></a>

To run the unit test suite:

```bash
pytest
```

To analyze test coverage (after installing pytest as above):
To analyze test coverage:

```bash
pytest --cov=xcube_stac
Expand All @@ -47,17 +202,18 @@ To produce an HTML
pytest --cov-report html --cov=xcube_stac
```

### Some notes on the strategy of unittesting
### Some notes on the strategy of unit-testing <a name="unittest_strategy"></a>

The unit test suite uses [pytest-recording](https://pypi.org/project/pytest-recording/)
to mock STAC catalogs. During development an actual HTTP request is performed
to a STAC catalog and the responses are saved in `cassettes/**.yaml` files.
During testing, only the `cassettes/**.yaml` files are used without an actual
HTTP request. During development run
HTTP request. During development, to save the responses to `cassettes/**.yaml`, run

```bash
pytest -v -s --record-mode new_episodes
```

which saves the responses to `cassettes/**.yaml`. The testing can be then
performed as usual.
Note that `--record-mode new_episodes` overwrites all cassettes. If the user only
wants to write cassettes which are not saved already, `--record-mode once` can be used.
[pytest-recording](https://pypi.org/project/pytest-recording/) supports all records modes given by [VCR.py](https://vcrpy.readthedocs.io/en/latest/usage.html#record-modes).
After recording the cassettes, testing can be then performed as usual.
13 changes: 13 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Security Policy

## Supported Versions

| Version | Supported |
|---------| ------------------ |
| 0.1.x | :white_check_mark: |

## Reporting a Vulnerability

To report a vulnerability, please post and [issue](https://github.com/xcube-dev/xcube-stac/issues)
and use prefix `[SECURITY]` for the title. Security issues will be treated
with high priority.
2 changes: 2 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ channels:
dependencies:
# Required
- python>=3.10
- odc-geo
- odc-stac
- pandas
- pystac
- pystac-client
Expand Down
56 changes: 56 additions & 0 deletions environment_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
name: xcube-stac
channels:
- conda-forge
- defaults
dependencies:
# Python
- python >=3.9
# Required xcube
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Required xcube
# Required by xcube

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- affine >=2.2
- botocore >=1.34.51
- cftime >=1.6.3
- click >=8.0
- cmocean >=2.0
- dask >=2021.6
- dask-image >=0.6
- deprecated >=1.2
- distributed >=2021.6
- fiona >=1.8
- fsspec >=2021.6
- gdal >=3.0
- geopandas >=0.8
- jdcal >=1.4
- jsonschema >=3.2
- mashumaro
- matplotlib-base >=3.8.3
- netcdf4 >=1.5
- numba >=0.52
- numcodecs >=0.12.1
- numpy >=1.16
- pandas >=1.3
- pillow >=6.0
- pyjwt >=1.7
- pyproj >=3.0
- pyyaml >=5.4
- rasterio >=1.2
- requests >=2.25
- rfc3339-validator >=0.1 # for python-jsonschema date-time format validation
- rioxarray >=0.11
- s3fs >=2021.6
- setuptools >=41.0
- shapely >=1.6
- tornado >=6.0
- urllib3 >=1.26
- xarray >=2022.6
- zarr >=2.11
# Required xcube-stac
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Required xcube-stac
# Required by xcube-stac

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- odc-geo
- odc-stac
- pystac
- pystac-client
# for testing
- black
- flake8
- pytest
- pytest-cov
- pytest-recording
Loading