Skip to content

Commit

Permalink
ready for merge
Browse files Browse the repository at this point in the history
  • Loading branch information
konstntokas committed Dec 8, 2024
1 parent d1045b5 commit e95b878
Show file tree
Hide file tree
Showing 6 changed files with 229 additions and 187 deletions.
78 changes: 30 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,22 @@ By default, a data ID represents one item, which is opened as a dataset, with ea
asset becoming a data variable within the dataset.

Additionally, a stack mode is
available, enabling the stacking of items using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
This allows for mosaicking multiple tiles and concatenating the datacube along the
temporal axis.
available, enabling the stacking of items using the core functionality of [xcube](https://xcube.readthedocs.io/en/latest/).
This allows for mosaicking multiple tiles grouped by solar day, and concatenating
the datacube along the temporal axis.

Also, [stackstac](https://stackstac.readthedocs.io/en/latest/) has been
Also, [odc-stac](https://odc-stac.readthedocs.io/en/latest/) and
[stackstac](https://stackstac.readthedocs.io/en/latest/) has been
considered during the evaluation of python libraries supporting stacking of STAC items.
However, the [benchmarking report](https://benchmark-odc-stac-vs-stackstac.netlify.app/)
comparing stackstac and odc-stac shows that ocd-stac outperforms stackstac. Furthermore,
stackstac shows an [issue](https://github.com/gjoseph92/stackstac/issues/196) in making
However, both stacking libraries depend on GDAL driver for reading the data with
`rasterio.open`, which prohibit the reading the data from the
[CDSE S3 endpoint](https://documentation.dataspace.copernicus.eu/APIs/S3.html), due to
blocking of the rasterio AWS environments.
Comparing [odc-stac](https://odc-stac.readthedocs.io/en/latest/) and
[stackstac](https://stackstac.readthedocs.io/en/latest/),
the [benchmarking report](https://benchmark-odc-stac-vs-stackstac.netlify.app/) shows
that ocd-stac outperforms stackstac. Furthermore, stackstac shows an
[issue](https://github.com/gjoseph92/stackstac/issues/196) in making
use of the overview levels of COGs files. Still, stackstac shows high popularity in the
community and might be supported in the future.

Expand All @@ -83,6 +90,11 @@ community and might be supported in the future.
### Overview of Jupyter notebooks <a name="overview_notebooks"></a>
The following Jupyter notebooks provide some examples:

* `example/notebooks/cdse_sentinel_2.ipynb`:
This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data
using the [CDSE STAC API](https://documentation.dataspace.copernicus.eu/APIs/STAC.html).
It shows stacking of individual tiles and mosaicking of multiple tiles measured on
the same solar day.
* `example/notebooks/earth_search_sentinel2_l2a_stack_mode.ipynb`:
This notebook shows an example how to stack multiple tiles of Sentinel-2 L2A data
from Earth Search by Element 84 STAC API. It shows stacking of individual tiles and
Expand Down Expand Up @@ -124,8 +136,7 @@ and is specified by the segment of the URL that follows the catalog's URL. The
`data_type` can be set to `dataset` and `mldataset`, which returns a `xr.Dataset` and
a [xcube multi-resoltuion dataset](https://xcube.readthedocs.io/en/latest/mldatasets.html),
respectively. Note that in the above example, if `data_type` is not assigned,
a multi-resolution dataset will be returned. This is because the item's asset links to
GeoTIFFs, which are opened as multi-resolution datasets by default.
a `xarray.Dataset` will be returned.

To use the stac-mode, initiate a stac store with the argument `stack_mode=True`.

Expand All @@ -138,50 +149,21 @@ store = new_data_store(
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
bbox=[506700, 5883400, 611416, 5984840],
time_range=["2020-07-15", "2020-08-01"],
crs="EPSG:32632",
spatial_res=20,
asset_names=["red", "green", "blue"],
apply_scaling=True,
)
```

In the stacking mode, the data IDs are the collection IDs within the STAC catalog. To
get Sentinel-2 L2A data, we assign `data_id` to `"sentinel-2-l2a"`. The bounding box and
time range are assigned to define the temporal and spatial extent of the data cube.
Additionally, for this example, we need to set a query argument to select a specific
[Sentinel-2 processing baseline](https://sentiwiki.copernicus.eu/web/s2-processing#S2Processing-L2Aprocessingbaseline),
as the collection contains multiple items for the same tile with different processing
procedures. Note that this requirement can vary between collections and must be
specified by the user. To set query arguments, the STAC catalog needs to be conform with
the [query extension](https://github.com/stac-api-extensions/query).

The stacking is performed using [odc-stac](https://odc-stac.readthedocs.io/en/latest/).
All arguments of [odc.stac.load](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html)
can be passed into the `open_data(...)` method, which forwards them to the
`odc.stac.load` function.

To apply mosaicking, we need to assign `groupby="solar_day"`, as shown in the
[documentation of `odc.stac.load`](https://odc-stac.readthedocs.io/en/latest/_api/odc.stac.load.html).
The following few lines of code show a small example including mosaicking.

```python
from xcube.core.store import new_data_store

store = new_data_store(
"stac",
url="https://earth-search.aws.element84.com/v1",
stack_mode=True
)
ds = store.open_data(
"sentinel-2-l2a",
data_type="dataset",
bbox=[9.1, 53.1, 10.7, 54],
time_range= ["2020-07-01", "2020-08-01"],
query={"s2:processing_baseline": {"eq": "02.14"}},
groupby="solar_day",
)
```
time range are assigned to define the temporal and spatial extent of the data cube. The
parameter `crs` and `spatial_res` are required as well and define the coordinate
reference system (CRS) and the spatial resolution respectively. Note, that the bounding
box and spatial resolution needs to be given in the respective CRS.

## Testing <a name="testing"></a>

Expand Down
20 changes: 15 additions & 5 deletions test/test_accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import unittest
from unittest.mock import patch
from unittest.mock import MagicMock

import dask
import dask.array as da
Expand Down Expand Up @@ -59,18 +60,25 @@ def test_del(self):
def test_root(self):
self.assertEqual("eodata", self.accessor.root)

@patch("rasterio.open")
@patch("rioxarray.open_rasterio")
def test_open_data(self, mock_open_rasterio):
# set-up mock
def test_open_data(self, mock_rioxarray_open, mock_rasterio_open):
# set-up mock for rioxarray.open_rasterio
mock_data = {
"band_1": (("y", "x"), da.ones((2048, 2048), chunks=(1024, 1024))),
}
mock_ds = xr.Dataset(mock_data)
mock_open_rasterio.return_value = mock_ds
mock_rioxarray_open.return_value = mock_ds

# set-up mock for rasterio.open
mock_rio_dataset = MagicMock()
mock_rio_dataset.overviews.return_value = [2, 4, 8]
mock_rasterio_open.return_value.__enter__.return_value = mock_rio_dataset

# start tests
access_params = dict(protocol="s3", root="eodata", fs_path="test.tif")
ds = self.accessor.open_data(access_params)
mock_open_rasterio.assert_called_once_with(
mock_rioxarray_open.assert_called_once_with(
"s3://eodata/test.tif",
chunks=dict(x=1024, y=1024),
band_as_variable=True,
Expand All @@ -97,8 +105,10 @@ def test_open_data(self, mock_open_rasterio):

mlds = self.accessor.open_data(access_params, data_type="mldataset")
self.assertIsInstance(mlds, MultiLevelDataset)
self.assertEqual(4, mlds.num_levels)
mock_rasterio_open.assert_called_once_with("s3://eodata/test.tif")
ds = mlds.base_dataset
mock_open_rasterio.assert_called_with(
mock_rioxarray_open.assert_called_with(
"s3://eodata/test.tif",
overview_level=None,
chunks=dict(x=1024, y=1024),
Expand Down
145 changes: 145 additions & 0 deletions test/test_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# The MIT License (MIT)
# Copyright (c) 2024 by the xcube development team and contributors
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import unittest
from unittest.mock import patch
import datetime

import pystac

from xcube_stac.helper import HelperCdse


class HelperCdseTest(unittest.TestCase):

def setUp(self):
self.asset = pystac.Asset(
href="test_href",
media_type="dummy",
roles=["data"],
extra_fields=dict(
alternate=dict(
s3=dict(
href=(
"/eodata/Sentinel-2/MSI/L2A/2024/11/07/S2A_MSIL2A_20241107"
"T113311_N0511_R080_T31VDG_20241107T123948.SAFE"
)
)
)
),
)
self.item = pystac.Item(
id="cdse_item_parts",
geometry={
"type": "Polygon",
"coordinates": [
[
[100.0, 0.0],
[101.0, 0.0],
[101.0, 1.0],
[100.0, 1.0],
[100.0, 0.0],
]
],
},
bbox=[100.0, 0.0, 101.0, 1.0],
datetime=datetime.datetime(2023, 1, 1, 0, 0, 0),
properties=dict(
tileId="title_id",
orbitNumber=0,
),
)
self.item.add_asset("PRODUCT", self.asset)

@patch("s3fs.S3FileSystem.glob")
def test_parse_item(self, mock_glob):
mock_glob.return_value = [
"eodata/Sentinel-2/MSI/L2A/2024/11/07/S2A_MSIL2A_20241107T113311_N0511"
"_R080_T31VDG_20241107T123948.SAFE/GRANULE/L2A_T32TMT_A017394_"
"20200705T101917/IMG_DATA/dummy.jp2"
]

helper = HelperCdse(
client_kwargs=dict(endpoint_url="https://eodata.dataspace.copernicus.eu"),
key="xxx",
secret="xxx",
)

item = self.item
item.properties["processorVersion"] = "02.14"
item_parsed = helper.parse_item(
self.item, asset_names=["B01", "B02"], crs="EPSG:4326", spatial_res=0.001
)
self.assertIn("B01", item_parsed.assets)
self.assertEqual(
0, item_parsed.assets["B01"].extra_fields["raster:bands"][0]["offset"]
)
self.assertIn("B02", item_parsed.assets)
self.assertEqual(
(
"eodata/Sentinel-2/MSI/L2A/2024/11/07/S2A_MSIL2A_20241107T113311_N0511"
"_R080_T31VDG_20241107T123948.SAFE/GRANULE/L2A_Ttitle_id_A000000_"
"20200705T101917/IMG_DATA/R60m/Ttitle_id_parts_B02_60m.jp2"
),
item_parsed.assets["B02"].href,
)
item = self.item
item.properties["processorVersion"] = "05.00"
item_parsed = helper.parse_item(
self.item, asset_names=["B01", "B02"], crs="EPSG:4326", spatial_res=0.001
)
self.assertIn("B01", item_parsed.assets)
self.assertEqual(
-0.1, item_parsed.assets["B01"].extra_fields["raster:bands"][0]["offset"]
)
self.assertIn("B02", item_parsed.assets)

@patch("s3fs.S3FileSystem.glob")
def test_get_data_access_params(self, mock_glob):
mock_glob.return_value = [
"eodata/Sentinel-2/MSI/L2A/2024/11/07/S2A_MSIL2A_20241107T113311_N0511"
"_R080_T31VDG_20241107T123948.SAFE/GRANULE/L2A_T32TMT_A017394_"
"20200705T101917/IMG_DATA/dummy.jp2"
]
helper = HelperCdse(
client_kwargs=dict(endpoint_url="https://eodata.dataspace.copernicus.eu"),
key="xxx",
secret="xxx",
)
item = self.item
item.properties["processorVersion"] = "05.00"
item_parsed = helper.parse_item(
self.item, asset_names=["B01", "B02"], crs="EPSG:3035", spatial_res=20
)
data_access_params = helper.get_data_access_params(
item_parsed, asset_names=["B01", "B02"], crs="EPSG:3035", spatial_res=20
)
self.assertEqual("B01", data_access_params["B01"]["name"])
self.assertEqual("s3", data_access_params["B01"]["protocol"])
self.assertEqual("eodata", data_access_params["B01"]["root"])
self.assertEqual(
(
"Sentinel-2/MSI/L2A/2024/11/07/S2A_MSIL2A_20241107T113311_N0511_R080_"
"T31VDG_20241107T123948.SAFE/GRANULE/L2A_T32TMT_A017394_20200705T101917"
"/IMG_DATA/dummy.jp2"
),
data_access_params["B01"]["fs_path"],
)
Loading

0 comments on commit e95b878

Please sign in to comment.