xcube-dev · konstntokas · Aug 19, 2024 · Jun 12, 2024 · Jun 18, 2024 · Jul 10, 2024
diff --git a/.github/workflows/unittest-workflow.yml b/.github/workflows/unittest-workflow.yml
@@ -9,17 +9,33 @@ jobs:
   unittest:
     runs-on: ubuntu-latest
     steps:
+      - name: checkout xcube
+        uses: actions/checkout@v4
+        with:
+          repository: xcube-dev/xcube
+          path: xcube
+
       - name: checkout xcube-stac
         uses: actions/checkout@v4
+        with:
+          path: xcube-stac
 
       - name: Set up MicroMamba
         uses: mamba-org/setup-micromamba@v1
         with:
-          environment-file: environment.yml
+          environment-file: xcube-stac/environment_workflow.yml
+
+      - name: Install xcube and start xcube server
+        shell: bash -l {0}
+        run: |
+          cd /home/runner/work/xcube-stac/xcube-stac/xcube
+          ls
+          pip install .
+          xcube serve -c examples/serve/demo/config.yml &
 
       - name: Run unit tests
         shell: bash -l {0}
         run: |
-          cd /home/runner/work/xcube-stac/xcube-stac
+          cd /home/runner/work/xcube-stac/xcube-stac/xcube-stac
           ls
           pytest 
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,13 @@
+# Code of Conduct
+
+We are committed to providing a friendly, safe, and welcoming environment 
+for all, regardless of gender, sexual orientation, ability, ethnicity, 
+religion, or any other characteristic.
+
+We expect everyone to treat each other with respect and kindness. We do not 
+tolerate harassment or discrimination of any kind.
+
+If you witness or experience any behavior that violates this code of conduct, 
+please report it to the project maintainers immediately.
+
+Thank you for helping us create a welcoming community!
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,42 @@
+# How to contribute
+
+The `xcube-stac` project welcomes contributions of any form
+as long as you respect our [code of conduct](CODE_OF_CONDUCT.md) and stay 
+in line with the following instructions and guidelines.
+
+If you have suggestions, ideas, feature requests, or if you have identified
+a malfunction or error, then please 
+[post an issue](https://github.com/xcube-dev/xcube-stac/issues). 
+
+If you'd like to submit code or documentation changes, we ask you to provide a 
+pull request (PR) 
+[here](https://github.com/xcube-dev/xcube-stac/pulls). 
+For code and configuration changes, your PR must be linked to a 
+corresponding issue. 
+
+To ensure that your code contributions are consistent with our project’s
+coding guidelines, please make sure all applicable items of the following 
+checklist are addressed in your PR.  
+
+**PR checklist**
+
+* Format code using [black](https://black.readthedocs.io/) with default settings.
+  Check also section [code style](#code-style) below.
+* Your change shall not break existing unit tests.
+  `pytest` must run without errors.
+* Add unit tests for any new code not yet covered by tests.
+* Make sure test coverage is close to 100% for any change.
+  Use `pytest --cov=xcube_stac --cov-report=html` to verify.
+
+## Code style <a name="code-style"></a> 
+
+The `xcube-stac` code compliant to [PEP-8](https://pep8.org/) except for a line 
+length of 88 characters as recommended by [black](https://black.readthedocs.io/).
+Since black is un-opinionated regarding the order of imports, 
+we use the following three import blocks separated by an empty 
+line:
+
+1. Python standard library imports, e.g., `os`, `typing`, etc
+2. 3rd-party imports, e.g., `xarray`, `zarr`, etc
+3. Relative `xcube_stac` module imports using `.` prefix.
+
diff --git a/README.md b/README.md
@@ -9,9 +9,21 @@
 named `stac` to xcube. The data store is used to access data from the
 [STAC - SpatioTemporal Asset Catalogs](https://stacspec.org/en/).
 
-## Setup
-
-### Installing the xcube-stac plugin from the repository
+## Table of contents
+1. [Setup](#setup)
+   1. [Installing the xcube-stac plugin from the repository](#install_source)
+2. [Overview](#overview)
+   1. [General structure of a STAC catalog](#stac_catalog)
+   2. [General functionality of xcube-stac](#func_xcube_stac)
+3. [Introduction to xcube-stac](#intro_xcube_stac)
+   1. [Overview of Jupyter notebooks](#overview_notebooks)
+   2. [Getting started](#getting_started)
+4. [Testing](#testing)
+   1. [Some notes on the strategy of unit-testing](#unittest_strategy)
+
+## Setup <a name="setup"></a>
+
+### Installing the xcube-stac plugin from the repository <a name="install_source"></a>
 
 Installing xcube-stac directly from the git repository, clone the repository,
 direct into `xcube-stac`, and follow the steps below:
@@ -26,15 +38,158 @@ This installs all the dependencies of `xcube-stac` into a fresh conda
 environment, then installs xcube-stac into this environment from the
 repository.
 
-## Testing
+## Overview <a name="overview"></a>
+
+### General structure of a STAC catalog <a name="stac_catalog"></a>
+A SpatioTemporal Asset Catalog (STAC) consists of three main components: catalog,
+collection, and item. Each item can contain multiple assets, each linked to a data
+source. Items are associated with a timestamp or temporal range and a bounding box
+describing the spatial extent of the data. 
+
+Items within a collection generally exhibit
+similarities. For example, a STAC catalog might contain multiple collections
+corresponding to different space-borne instruments. Each item represents a measurement
+covering a specific spatial area at a particular timestamp. For a multi-spectral
+instrument, different bands can be stored as separate assets.
+
+A STAC catalog can comply with the [STAC API - Item Search](https://github.com/radiantearth/stac-api-spec/tree/release/v1.0.0/item-search#stac-api---item-search)
+conformance class, enabling server-side searches for items based on specific
+parameters. If this compliance is not met, only client-side searches are possible,
+which can be slow for large STAC catalogs.
+
+### General functionality of xcube-stac <a name="func_xcube_stac"></a>
+The xcube-stac plugin reads the data sources from the STAC catalog and opens the data
+in an analysis ready form following the [xcube dataset convetion](https://xcube.readthedocs.io/en/latest/cubespec.html).
+By default, a data ID represents one item, which is opened as a dataset, with each
+asset becoming a data variable within the dataset. 
+
+Additionally, a stack mode is
+available, enabling the stacking of items using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
+This allows for mosaicking multiple tiles and concatenating the datacube along the
+temporal axis.
+
+Also, [stackstac](https://stackstac.readthedocs.io/en/latest/) has been
+considered during the evaluation of python libraries supporting stacking of STAC items.
+However, the [benchmarking report](https://benchmark-odc-stac-vs-stackstac.netlify.app/)
+comparing stackstac and odc-stac shows that ocd-stac outperforms stackstac. Furthermore,
+stackstac shows an [issue](https://github.com/gjoseph92/stackstac/issues/196) in making
+use of the overview levels of COGs files. Still, stackstac shows high popularity in the
+community and might be supported in the future. 
+
+## Introduction to xcube-stac <a name="intro_xcube_stac"></a> 
+
+### Overview of Jupyter notebooks <a name="overview_notebooks"></a> 
+The following Jupyter notebooks provide some examples: 
+
+* `example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb`:
+  This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data
+  from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and
+  mosaicking of multiple tiles measured on the same solar day.
+* `example/notebooks/geotiff_nonsearchable_catalog.ipynb`:
+  This notebook shows an example how to load a GeoTIFF file from a non-searchable
+  STAC catalog.
+* `example/notebooks/geotiff_searchable_catalog.ipynb`:
+  This notebook shows an example how to load a GeoTIFF file from a searchable
+  STAC catalog.
+* `example/notebooks/netcdf_searchable_catalog.ipynb`:
+  This notebook shows an example how to load a NetCDF file from a searchable
+  STAC catalog.
+* `example/notebooks/xcube_server_stac_s3.ipynb`:
+  This notebook shows an example how to open data sources published by xcube server
+  via the STAC API.
+
+### Getting started <a name="getting_started"></a> 
+
+The xcube [data store framework](https://xcube.readthedocs.io/en/latest/dataaccess.html#data-store-framework)
+allows to easily access data in an analysis ready format, following the few lines of
+code below. 
+
+```python
+from xcube.core.store import new_data_store
+
+store = new_data_store(
+    "stac",
+    url="https://earth-search.aws.element84.com/v1"
+)
+ds = store.open_data(
+    "collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A",
+    data_type="dataset"
+)
+```
+The data ID `"collections/sentinel-2-l2a/items/S2B_32TNT_20200705_0_L2A"` points to the
+[STAC item's JSON](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md)
+and is specified by the segment of the URL that follows the catalog's URL. The
+`data_type` can be set to `dataset` and `mldataset`, which returns a `xr.Dataset` and
+a [xcube multi-resoltuion dataset](https://xcube.readthedocs.io/en/latest/mldatasets.html),
+respectively. Note that in the above example, if `data_type` is not assigned,
+a multi-resolution dataset will be returned. This is because the item's asset links to
+GeoTIFFs, which are opened as multi-resolution datasets by default.
+
+To use the stac-mode, initiate a stac store with the argument `stack_mode=True`.
+
+```python
+from xcube.core.store import new_data_store
+
+store = new_data_store(
+    "stac",
+    url="https://earth-search.aws.element84.com/v1",
+    stack_mode=True
+)
+ds = store.open_data(
+    "sentinel-2-l2a",
+    data_type="dataset",
+    bbox=[9.1, 53.1, 10.7, 54],
+    time_range= ["2020-07-01", "2020-08-01"],
+    query={"s2:processing_baseline": {"eq": "02.14"}},
+)
+```
+
+In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To
+get Sentinel-2 L2A data, we assign `data_id` to `"sentinel-2-l2a"`. The bounding box and
+time range are assigned to define the temporal and spatial extent of the data cube. 
+Additionally, for this example, we need to set a query argument to select a specific
+[Sentinel-2 processing baseline](https://sentiwiki.copernicus.eu/web/s2-processing#S2Processing-L2Aprocessingbaseline),
+as the collection contains multiple items for the same tile with different processing
+procedures. Note that this requirement can vary between collections and must be
+specified by the user. To set query arguments, the STAC catalog needs to be conform with
+the [query extension](https://github.com/stac-api-extensions/query).
+
+The stacking is performed using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
+All arguments of [odc.stac.load](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html)
+can be passed into the `open_data(...)` method, which forwards them to the
+`odc.stac.load` function.
+
+To apply mosaicking, we need to assign `groupby="solar_day"`, as shown in the
+[documentation of `odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html).
+The following few lines of code show a small example including mosaicking.  
+
+```python
+from xcube.core.store import new_data_store
+
+store = new_data_store(
+    "stac",
+    url="https://earth-search.aws.element84.com/v1",
+    stack_mode=True
+)
+ds = store.open_data(
+    "sentinel-2-l2a",
+    data_type="dataset",
+    bbox=[9.1, 53.1, 10.7, 54],
+    time_range= ["2020-07-01", "2020-08-01"],
+    query={"s2:processing_baseline": {"eq": "02.14"}},
+    groupby="solar_day",
+)
+```
+
+## Testing <a name="testing"></a>
 
 To run the unit test suite:
 
 ```bash
 pytest
 ```
 
-To analyze test coverage (after installing pytest as above):
+To analyze test coverage:
 
 ```bash
 pytest --cov=xcube_stac
@@ -47,17 +202,18 @@ To produce an HTML
 pytest --cov-report html --cov=xcube_stac
 ```
 
-### Some notes on the strategy of unittesting
+### Some notes on the strategy of unit-testing <a name="unittest_strategy"></a>
 
 The unit test suite uses [pytest-recording](https://pypi.org/project/pytest-recording/)
 to mock STAC catalogs. During development an actual HTTP request is performed
 to a STAC catalog and the responses are saved in `cassettes/**.yaml` files.
 During testing, only the `cassettes/**.yaml` files are used without an actual
-HTTP request. During development run
+HTTP request. During development, to save the responses to `cassettes/**.yaml`, run
 
 ```bash
 pytest -v -s --record-mode new_episodes
 ```
-
-which saves the responses to `cassettes/**.yaml`. The testing can be then
-performed as usual.
+Note that `--record-mode new_episodes` overwrites all cassettes. If the user only
+wants to write cassettes which are not saved already, `--record-mode once` can be used.
+[pytest-recording](https://pypi.org/project/pytest-recording/) supports all records modes given by [VCR.py](https://vcrpy.readthedocs.io/en/latest/usage.html#record-modes).
+After recording the cassettes, testing can be then performed as usual.
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,13 @@
+# Security Policy
+
+## Supported Versions
+
+| Version | Supported          |
+|---------| ------------------ |
+| 0.1.x   | :white_check_mark: |
+
+## Reporting a Vulnerability
+
+To report a vulnerability, please post and [issue](https://github.com/xcube-dev/xcube-stac/issues)
+and use prefix `[SECURITY]` for the title. Security issues will be treated 
+with high priority.
diff --git a/environment.yml b/environment.yml
@@ -5,6 +5,8 @@ channels:
 dependencies:
   # Required
   - python>=3.10
+  - odc-geo
+  - odc-stac
   - pandas
   - pystac
   - pystac-client

diff --git a/environment_workflow.yml b/environment_workflow.yml
@@ -0,0 +1,56 @@
+name: xcube-stac
+channels:
+  - conda-forge
+  - defaults
+dependencies:
+  # Python
+  - python >=3.9
+  # Required xcube
-  # Required xcube
+  # Required by xcube
-  # Required xcube
+  # Required by xcube
+  - affine >=2.2
+  - botocore >=1.34.51
+  - cftime >=1.6.3
+  - click >=8.0
+  - cmocean >=2.0
+  - dask >=2021.6
+  - dask-image >=0.6
+  - deprecated >=1.2
+  - distributed >=2021.6
+  - fiona >=1.8
+  - fsspec >=2021.6
+  - gdal >=3.0
+  - geopandas >=0.8
+  - jdcal >=1.4
+  - jsonschema >=3.2
+  - mashumaro
+  - matplotlib-base >=3.8.3
+  - netcdf4 >=1.5
+  - numba >=0.52
+  - numcodecs >=0.12.1
+  - numpy >=1.16
+  - pandas >=1.3
+  - pillow >=6.0
+  - pyjwt >=1.7
+  - pyproj >=3.0
+  - pyyaml >=5.4
+  - rasterio >=1.2
+  - requests >=2.25
+  - rfc3339-validator >=0.1  # for python-jsonschema date-time format validation
+  - rioxarray >=0.11
+  - s3fs >=2021.6
+  - setuptools >=41.0
+  - shapely >=1.6
+  - tornado >=6.0
+  - urllib3 >=1.26
+  - xarray >=2022.6
+  - zarr >=2.11
+# Required xcube-stac
-# Required xcube-stac
+  # Required by xcube-stac
-# Required xcube-stac
+  # Required by xcube-stac
+  - odc-geo
+  - odc-stac
+  - pystac
+  - pystac-client
+  # for testing
+  - black
+  - flake8
+  - pytest
+  - pytest-cov
+  - pytest-recording