Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement open_data and support tiff, netcdf and zarr for s3 and https protocol; Implement stack-mode which stacks items containing tiffs; #19

Merged
merged 24 commits into from
Aug 19, 2024

Conversation

konstntokas
Copy link
Collaborator

@konstntokas konstntokas commented Jul 31, 2024

Closes #3, #8

1. Implement open_data and support tiff, netcdf and zarr for s3 and https protocol

I suggest to start with the notebooks in the folder examples as an introduction.

Here, I encountered a problem with the levels file in the server demo in xcube. When I try to open the levels file, I get a mldataset instance back. Wen I then try to open one level, an error occurs in xcube_server_stac_s3.ipynb cell 12. I suspect that when starting the server, the .levels file with Identifier local is used to calculate local_1w in resample_in_time.py, which adds <dim>_bnds data variables to the xr.Dataset, which in turn brings problems in levels. Any idea to mitigate that problem without changing the xcube server demo example?

Further I changes the github action workflow so that the the current xcube repo is checked out. Furthermore, I run the xcube server to get a higher test coverage. The unit tests work with decorators and are only run if the server is running. So for the later development, the server can be easily turned of, if needed.

Filtering the asset is not implemented yet. It is documented in Issue #18. I see this as a nice add-on for later. Also the stac-spec of an asset requires only href.

2. Implement stack-mode which stacks items containing tiffs

The data store parameter <stack_mode> is added. If True, the collection ID becomes the <data_id>. Items will selected with matches a bounding box and time range. These items will be stacked using odc-stac. So far only tiffs are supported. Reading of data in odc-stac relies on rasterio, which is based on GDAL. It should support formats, if a GDAL driver exists. Examples of other formats will be added in the future.

@konstntokas konstntokas changed the title Implement open_data and support tiff, netcdf and zarr for s3 and https protocol #17; Implement stack-mode which stacks items containing tiffs; Implement open_data and support tiff, netcdf and zarr for s3 and https protocol; Implement stack-mode which stacks items containing tiffs; Jul 31, 2024
@konstntokas konstntokas requested a review from forman August 1, 2024 08:26
@konstntokas konstntokas marked this pull request as ready for review August 1, 2024 08:26
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good! I have actually no serious complaints. Almost all my comments and suggestions regard style & design, only the one regarding the store ctor is mandatory.

You should add a coverage badge.

dependencies:
# Python
- python >=3.9
# Required xcube
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Required xcube
# Required by xcube

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- urllib3 >=1.26
- xarray >=2022.6
- zarr >=2.11
# Required xcube-stac
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Required xcube-stac
# Required by xcube-stac

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from .constants import DATA_OPENER_IDS, FloatInt, MAP_MIME_TYP_FORMAT


def _get_assets_from_item(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you are using underscore prefixes to indicate that the functions in this module are implementation details and not API.

However, underscored variables, functions, and classes should not be imported. Even by own modules (exceptions exists, e.g. testing). You can underscore, by renaming utils.py into _utils.py.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self._datasets.append(_apply_scaling_nodata(ds, self._items))

def _get_num_levels_lazily(self) -> int:
return len(self._datasets)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return len(self._datasets)
return len(self._resolutions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return len(self._datasets)

def _get_dataset_lazily(self, index: int, parameters) -> xr.Dataset:
return self._datasets[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self._datasets[index]
ds = odc.stac.load(
self._items,
resolution=self._resolutions[index],
**self._open_params,
)
return _apply_scaling_nodata(ds, self._items)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see below.

}


class SingleImplementation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single Implementation of what? I recommend finding a better name of what is implemented here, for example, the Store Mode. Then StoreMode would describe the purely abstract interface, and SingleStoreMode represents its implementation for the mode Single.

Suggested change
class SingleImplementation:
class SingleStoreMode:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name to SingleStoreMode. To be consistent I also changed to name from StackImplementation to StackStoreMode.

Comment on lines 289 to 290
f"Only 's3' and 'https' protocols are supported, not {protocol!r}. "
f"The asset {asset.extra_fields['id']!r} has a href {asset.href!r}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Only 's3' and 'https' protocols are supported, not {protocol!r}. "
f"The asset {asset.extra_fields['id']!r} has a href {asset.href!r}."
f"Only 's3' and 'https' protocols are supported, not {protocol!r}."
f" The asset {asset.extra_fields['id']!r} has a href {asset.href!r}."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 342 to 351
else:
if not self._s3_accessor.root == root:
LOG.debug(
f"The bucket {self._s3_accessor.root!r} of the "
f"S3 object storage changed to {root!r}. "
"A new s3 data opener will be initialized."
)
self._s3_accessor = S3DataAccessor(
root, storage_options=storage_options
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
if not self._s3_accessor.root == root:
LOG.debug(
f"The bucket {self._s3_accessor.root!r} of the "
f"S3 object storage changed to {root!r}. "
"A new s3 data opener will be initialized."
)
self._s3_accessor = S3DataAccessor(
root, storage_options=storage_options
)
elif self._s3_accessor.root != root:
LOG.debug(
f"The bucket {self._s3_accessor.root!r} of the "
f"S3 object storage changed to {root!r}. "
"A new s3 data opener will be initialized."
)
self._s3_accessor = S3DataAccessor(
root, storage_options=storage_options
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 367 to 374
else:
if not self._https_accessor.root == root:
LOG.debug(
f"The root {self._https_accessor.root!r} of the "
f"https data opener changed to {root!r}. "
"A new https data opener will be initialized."
)
self._https_accessor = HttpsDataAccessor(root)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
if not self._https_accessor.root == root:
LOG.debug(
f"The root {self._https_accessor.root!r} of the "
f"https data opener changed to {root!r}. "
"A new https data opener will be initialized."
)
self._https_accessor = HttpsDataAccessor(root)
elif self._https_accessor.root != root:
LOG.debug(
f"The root {self._https_accessor.root!r} of the "
f"https data opener changed to {root!r}. "
"A new https data opener will be initialized."
)
self._https_accessor = HttpsDataAccessor(root)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

elif stack_mode is True or stack_mode == "odc-stac":
self._impl = StackImplementation(
self._catalog, self._url_mod, self._searchable, self._storage_options_s3
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Else? Since it is possible to instantiate the store via ctor, you should raise a ValueError() in the else-case. Otherwise _impl will not be initialised.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data store parameters are validated against the schema in the xcube data store framework. the schema is define in constants.py and this is tested in (test_store.py)[https://github.com/xcube-dev/xcube-stac/blob/konstntokas-xxx-implement_stac_mode/test/test_store.py#L90].

However, I agree, if a user would initialize a stac data store directly by calling the class StacDataStore, then this is not covered. I therefore added a else case.

@forman
Copy link
Member

forman commented Aug 5, 2024

I suspect that when starting the server, the .levels file with Identifier local is used to calculate local_1w in resample_in_time.py, which adds <dim>_bnds data variables to the xr.Dataset, which in turn brings problems in levels. Any idea to mitigate that problem without changing the xcube server demo example?

Let's address this separately.

@konstntokas
Copy link
Collaborator Author

I suspect that when starting the server, the .levels file with Identifier local is used to calculate local_1w in resample_in_time.py, which adds <dim>_bnds data variables to the xr.Dataset, which in turn brings problems in levels. Any idea to mitigate that problem without changing the xcube server demo example?

Let's address this separately.

I understood the problem better, and it is on the xcube-stac side. I agree addressing it seperately. It is documented in #20.

Copy link

codecov bot commented Aug 6, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@konstntokas
Copy link
Collaborator Author

You should add a coverage badge.

Coverage badge is added. See https://app.codecov.io/gh/xcube-dev/xcube-stac/tree/konstntokas-xxx-implement_stac_mode. I need to double check the links once it is merged.

@konstntokas konstntokas requested a review from forman August 6, 2024 11:32
@konstntokas konstntokas merged commit a54b3de into main Aug 19, 2024
3 checks passed
@konstntokas konstntokas deleted the konstntokas-xxx-implement_stac_mode branch August 19, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement StacDataStore.open_data()
2 participants