Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Preload API #12

Merged
merged 153 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
4082cab
Implement initial methods in clms.py
b-yogesh Nov 4, 2024
a55d133
Refactor code
b-yogesh Nov 7, 2024
52b1e2a
Refactor code again
b-yogesh Nov 7, 2024
31ccd70
Implement get_data_ids generator
b-yogesh Nov 7, 2024
b124ef1
Tiny typo fix
b-yogesh Nov 7, 2024
39d8b37
Implement has_data
b-yogesh Nov 7, 2024
3b34c3a
Initial version of describe_data (not working entirely)
b-yogesh Nov 7, 2024
70a0456
Implement describe_data
b-yogesh Nov 8, 2024
c075479
Implement get_open_data_params_schema
b-yogesh Nov 8, 2024
49cd69b
Implemented access token class
b-yogesh Nov 8, 2024
4ffa9ab
Implemented access token class
b-yogesh Nov 8, 2024
f8ccbfb
[In progress] - open_data implementation
b-yogesh Nov 11, 2024
94ff10d
[In progress] - open_data implementation - implemented _prepare_downl…
b-yogesh Nov 12, 2024
26a6322
[In progress] - open_data implementation - more impl.
b-yogesh Nov 12, 2024
f6645f2
[In progress] - open_data implementation - more impl.
b-yogesh Nov 14, 2024
7951b2e
Update make_api_request
b-yogesh Nov 15, 2024
e950bb7
temporary bbox and crs handling
b-yogesh Nov 15, 2024
5ccede3
fix condition
b-yogesh Nov 15, 2024
24e93d1
fix get_data_store_params_schema
b-yogesh Nov 15, 2024
2e5f46f
Add constants
b-yogesh Nov 15, 2024
fc938bc
Refactoring
b-yogesh Nov 15, 2024
76169d4
Add schema for preload_data
b-yogesh Nov 15, 2024
d680129
Remove error message truncation
b-yogesh Nov 15, 2024
a91bd5c
Update get_metadata method
b-yogesh Nov 15, 2024
8aa6014
Add initial unsupported datasets check
b-yogesh Nov 15, 2024
13febb6
Raise exceptions for unsupported data and add preload params schema
b-yogesh Nov 18, 2024
63ba00c
Update schema methods
b-yogesh Nov 18, 2024
a049218
Add TODOs
b-yogesh Nov 19, 2024
a6ccac0
Implement first TODO: change data_id def
b-yogesh Nov 19, 2024
3a1c329
Implement first TODO: add include_attr bool impl
b-yogesh Nov 19, 2024
efc0c16
Fix has_data and describe_data
b-yogesh Nov 19, 2024
d4810bd
[WIP] queue download refactor
b-yogesh Nov 19, 2024
e821dd2
[WIP] refactor existing code into preload and clms classes
b-yogesh Nov 20, 2024
6d640f2
[WIP] implement queue downloads.
b-yogesh Nov 22, 2024
5fbd5aa
[WIP] Further impl. preload
b-yogesh Nov 25, 2024
7926ce4
[WIP] Further impl. preload
b-yogesh Nov 26, 2024
d064097
[WIP] Working download impl.
b-yogesh Nov 27, 2024
50b4c52
[WIP] Add initial cancel handler
b-yogesh Nov 27, 2024
faaec47
[WIP] Add initial cancel handler
b-yogesh Nov 29, 2024
20e44b4
[WIP] Refactoring
b-yogesh Dec 2, 2024
167baf7
[WIP] Improved download data
b-yogesh Dec 2, 2024
3ea8e7f
[WIP] Improved download data + merging
b-yogesh Dec 3, 2024
0be046b
Finish implementation
b-yogesh Dec 5, 2024
16ee337
Added tests for utils.py
b-yogesh Dec 5, 2024
e793038
Added tests for api_token.py
b-yogesh Dec 5, 2024
3abef8e
Refactor clms.py
b-yogesh Dec 6, 2024
425780d
Add docstrings clms.py
b-yogesh Dec 6, 2024
db902a3
Add tests for clms.py
b-yogesh Dec 6, 2024
3e3ce73
add preload const
b-yogesh Dec 6, 2024
5d7125d
Refactor
b-yogesh Dec 6, 2024
dceb47d
Add Docstrings and Type Annotations
b-yogesh Dec 6, 2024
7dbdb4b
Add example notebook
b-yogesh Dec 6, 2024
0d4018d
Minor fixes
b-yogesh Dec 6, 2024
865ead4
Add missing docs
b-yogesh Dec 6, 2024
3e1b545
Fix tests
b-yogesh Dec 10, 2024
975e498
Update error message
b-yogesh Dec 10, 2024
2176db7
Modify list to iterator in get_data_ids
b-yogesh Dec 10, 2024
715d1cf
Update CLMSDataStoreTutorial.ipynb
b-yogesh Dec 10, 2024
c7c79a8
Update README.md
b-yogesh Dec 10, 2024
a88eaf9
Create unittest.yml
b-yogesh Dec 10, 2024
70188bd
Add test_cache_manager.py
b-yogesh Dec 10, 2024
56fcf42
Add test_token_handler.py
b-yogesh Dec 10, 2024
65a271f
Rename .github/unittest.yml to .github/workflows/unittest.yml
b-yogesh Dec 10, 2024
7a36783
Add test_processor.py
b-yogesh Dec 10, 2024
d98c712
Merge remote-tracking branch 'origin/yogesh_preload-data' into yogesh…
b-yogesh Dec 10, 2024
ef40d9a
Update env
b-yogesh Dec 10, 2024
1c7888b
Remove redundant api class
b-yogesh Dec 10, 2024
4546279
Update pyproject.toml
b-yogesh Dec 10, 2024
f91ecd1
add ipywidgets
b-yogesh Dec 10, 2024
2e2250a
fix test_clms.py
b-yogesh Dec 10, 2024
dd3229a
Add CLMS url as constant
b-yogesh Dec 10, 2024
5b0eda2
Fix tests
b-yogesh Dec 10, 2024
a7a612d
Update README.md
b-yogesh Dec 10, 2024
b9c9e03
Update README.md
b-yogesh Dec 11, 2024
dc622e9
Update CLMSDataStoreTutorial.ipynb
b-yogesh Dec 11, 2024
f3efb86
Update .gitignore
b-yogesh Dec 11, 2024
0850fb6
Add numpy for tests
b-yogesh Dec 11, 2024
8ebf8f7
Add missing license text
b-yogesh Dec 11, 2024
7243361
Apply suggestions from code review
b-yogesh Dec 12, 2024
36583c4
Rename classes and fix tests
b-yogesh Dec 12, 2024
78c8943
Convert file_store and cache to properties
b-yogesh Dec 12, 2024
da3928e
Remove test_store.py
b-yogesh Dec 12, 2024
54e2f7f
Remove None return doc
b-yogesh Dec 12, 2024
02f0261
Improve docstrings
b-yogesh Dec 12, 2024
bdb362d
Update README.md
b-yogesh Dec 12, 2024
b637c77
Move functions away from utils to respective files
b-yogesh Dec 12, 2024
cc44bdb
Move constants to their respective classes
b-yogesh Dec 12, 2024
f178dce
Improve make_api_request
b-yogesh Dec 13, 2024
b30329c
Improve make_api_request #2
b-yogesh Dec 13, 2024
bb57381
Improve tests
b-yogesh Dec 13, 2024
706c6df
Datastore to MutableDataStore
b-yogesh Dec 13, 2024
0c49099
WIP [New preload API refactoring]
b-yogesh Dec 30, 2024
96be05b
WIP [New preload API refactoring - part 2]
b-yogesh Jan 2, 2025
26c030a
WIP [New preload API refactoring - update tests]
b-yogesh Jan 2, 2025
e58be16
update xcube version
b-yogesh Jan 2, 2025
16feebf
Add further tests
b-yogesh Jan 3, 2025
27ab2ef
Remove init comments
b-yogesh Jan 6, 2025
eba6439
Ready for release
b-yogesh Jan 6, 2025
8840570
Update download_manager.py and add more tests
b-yogesh Jan 6, 2025
def671b
Merge branch 'refs/heads/yogesh_preload-data' into yogesh_preload-data-2
b-yogesh Jan 6, 2025
0f654ab
[WIP] Further Refactoring
b-yogesh Jan 6, 2025
836c34f
[WIP] Update tests
b-yogesh Jan 6, 2025
725ffeb
[WIP] Delete CacheManager
b-yogesh Jan 6, 2025
e71f845
[WIP] Update tests
b-yogesh Jan 7, 2025
4c55eb0
[WIP] Update store.py
b-yogesh Jan 7, 2025
5dad841
[WIP] Update tests - cov 92
b-yogesh Jan 7, 2025
2ff4818
[WIP] Add cassettes for test_store.py
b-yogesh Jan 8, 2025
f34808e
[WIP] Update and add more tests cov-96
b-yogesh Jan 8, 2025
febae11
[WIP] Add more cassettes
b-yogesh Jan 8, 2025
df7d512
[WIP] More refactoring
b-yogesh Jan 8, 2025
b5ef6a4
[WIP] More refactoring
b-yogesh Jan 9, 2025
ed5fb0a
[WIP] Update workflow to include xcube repo install
b-yogesh Jan 9, 2025
6f59ab6
[WIP] Update notebook
b-yogesh Jan 9, 2025
125c390
[WIP] Add --no-deps
b-yogesh Jan 9, 2025
90b0236
[WIP] update unittest.yml
b-yogesh Jan 9, 2025
74bd076
[WIP] update unittest.yml
b-yogesh Jan 9, 2025
7b41350
[WIP] debug unittest.yml
b-yogesh Jan 9, 2025
7ad24c1
[WIP] debug unittest.yml
b-yogesh Jan 9, 2025
a27810f
[WIP] debug unittest.yml
b-yogesh Jan 9, 2025
4a59a54
[WIP] debug unittest.yml
b-yogesh Jan 9, 2025
084a3ba
[WIP] debug unittest.yml
b-yogesh Jan 9, 2025
ea5299f
[WIP] add requests in deps
b-yogesh Jan 9, 2025
687da65
[WIP] add fsspec, pyproj in deps
b-yogesh Jan 9, 2025
b694e48
[WIP] update env.yml
b-yogesh Jan 9, 2025
efa0002
[WIP]update unittest.yml to include all xcube deps in the env...yml
b-yogesh Jan 10, 2025
99cb1cd
[WIP] update env.yml and pyproject.toml
b-yogesh Jan 10, 2025
c91130e
[WIP] add ipywidgets
b-yogesh Jan 10, 2025
970bca6
[WIP] update CHANGES.md and README.md
b-yogesh Jan 10, 2025
ed8a8c3
[WIP] minor update to download_manager.py
b-yogesh Jan 10, 2025
0809ecb
[WIP] remove debug statements from workflow
b-yogesh Jan 10, 2025
958d766
Final refactoring
b-yogesh Jan 10, 2025
b894586
Add xcube back in workflow
b-yogesh Jan 10, 2025
e8056d9
copyright 2024 -> 2025
b-yogesh Jan 10, 2025
c782891
copyright 2024 -> 2025
b-yogesh Jan 10, 2025
d5e50b5
Update based on reviewer's comments
b-yogesh Jan 15, 2025
15fe1ea
Update based on reviewer's comments, pt. 2
b-yogesh Jan 16, 2025
0173b25
Add missing cassette
b-yogesh Jan 16, 2025
3aa8cfa
Update notebook + use protocol for data opener ids
b-yogesh Jan 16, 2025
c8994d2
Update for better notifications, and removing close() when user cance…
b-yogesh Jan 16, 2025
1a2914d
Multiple imports -> single imports per line
b-yogesh Jan 17, 2025
268f2ff
update based on reviewers comments
b-yogesh Jan 17, 2025
4eeb990
Remove tqdm
b-yogesh Jan 17, 2025
51504a4
add dynamic chunk sizes
b-yogesh Jan 17, 2025
9edb5e4
Update get_data_ids method args type
b-yogesh Jan 20, 2025
bc484f9
Fix tests
b-yogesh Jan 20, 2025
702003a
Update notebook
b-yogesh Jan 20, 2025
b1a5241
Remove redundant method
b-yogesh Jan 20, 2025
14946d2
Add spaces after the header
b-yogesh Jan 20, 2025
39bd4bf
Update notebook
b-yogesh Jan 20, 2025
5f61c45
Update notebook
b-yogesh Jan 20, 2025
d02292f
Remove tqdm dep
b-yogesh Jan 20, 2025
4f8f885
Update CHANGES.md
b-yogesh Jan 20, 2025
cfda06c
Update CHANGES.md
b-yogesh Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions .github/workflows/unittest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Unittest xcube-clms

on:
push:
release:
types: [ published ]

jobs:
unittest:
runs-on: ubuntu-latest
steps:
- name: checkout xcube-clms
uses: actions/checkout@v4

- name: Create modified environment file
shell: bash -l {0}
run: |
grep -v "^ - xcube" environment.yml > env_no_xcube.yml
echo "Generated env file without xcube:"
cat env_no_xcube.yml
curl https://raw.githubusercontent.com/xcube-dev/xcube/refs/heads/main/environment.yml | awk 's==1 {print $0} /^dependencies:/ {s=1}' >> env_no_xcube.yml
echo "Generated env file with xcube dependencies manually installed:"
cat env_no_xcube.yml


- name: Set up MicroMamba
uses: mamba-org/setup-micromamba@v1
with:
environment-file: env_no_xcube.yml

- name: Checkout xcube
uses: actions/checkout@v4
with:
repository: xcube-dev/xcube
path: xcube

- name: Install and Test
shell: bash -l {0}
run: |
cd xcube
python -m pip install -e . --no-deps
cd ..
pytest test/ --cov=xcube_clms --cov-report=xml

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v4
with:
verbose: true
token: ${{ secrets.CODECOV_TOKEN }}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,5 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

examples/notebooks/preload_cache/
19 changes: 17 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
## Changes in 0.1.0 (in development)
## Changes in 0.2.0(in development)

Initial version of CLMS Data Store.
### Enhancements

* Implemented the new experimental `Preload` API in xcube for improved
performance.
* Preload progress is now displayed in a user-friendly table format. This
display can be disabled.
* All preloaded data is now stored in the `.zarr` format.

### Fixes

### Other changes

## Changes in 0.1.0

* Initial version of CLMS Data Store with a new experimental Preload API for
preloading the datasets.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 Brockmann Consult GmbH
Copyright (c) 2025 Brockmann Consult GmbH

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
140 changes: 139 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,139 @@
# xcube-clms
# xcube-clms

[![Unittest xcube-clms](https://github.com/xcube-dev/xcube-clms/actions/workflows/unittest.yml/badge.svg)](https://github.com/xcube-dev/xcube-clms/actions/workflows/unittest.yml)
[![Codecov xcube-clms](https://codecov.io/gh/xcube-dev/xcube-clms/graph/badge.svg?token=n6X9zQIkXb)](https://codecov.io/gh/xcube-dev/xcube-clms)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/github/license/dcs4cop/xcube-smos)](https://github.com/xcube-dev/xcube-clms/blob/main/LICENSE)

The `xcube-clms` Python package provides an
[xcube data store](https://xcube.readthedocs.io/en/latest/api.html#data-store-framework)
that enables access to datasets hosted by the
[Copernicus Land Monitoring Service (CLMS)](https://land.copernicus.eu/en).
The data store is called `"clms"` and implemented as
an [xcube plugin](https://xcube.readthedocs.io/en/latest/plugins.html).
It uses the [CLMS API](https://eea.github.io/clms-api-docs/introduction.html)
under the hood.

## Setup <a name="setup"></a>

### Installing the xcube-clms plugin from the repository <a name="install_source"></a>

To install xcube-clms directly from the git repository, clone the repository,
`cd` into `xcube-clms`, and follow the steps below:

```bash
conda env create -f environment.yml
conda activate xcube-clms
pip install .
```

This sets up a new conda environment, installs all the dependencies required
for `xcube-clms`, and then installs `xcube-clms` directly from the repository
into the environment.

### Create credentials to access the CLMS API

Create the credentials as a `json` file required for the CLMS API following
the [documentation](https://eea.github.io/clms-api-docs/authentication.html).
The credentials will be required during the initialization of the CLMS data
store. Please follow the instructions in the
`example/notebooks/CLMSDataStoreTutorial.ipynb`,
on how to pass the credentials from the `json` file to the store.

## Testing <a name="testing"></a>

To run the unit test suite:

```bash
pytest
```

## Additional Notes about the data store

This data store introduces the initial mechanism of preloading data, including
cache management, downloading, and file processing.
This uses the experimental Preload API from the xcube data store.

This new addition of a preload interface is due to the nature of the CLMS API
which allows the user to create data requests, with undetermined time to wait in
the queue for the request to be processed, followed by downloading zip files,
unzipping them, extracting them in a cache and processing them which can be then
finally opened using a cache data store.
The default is `file` data store stored at `/clms_cache` location in your `cwd`,
but the users are free to choose their data store of their liking.

Preloading allows the data store to request the datasets for download to the
CLMS API in both blocking/non-blocking way which handles sending the download
request, queueing for download, waiting in the queue, periodically checking for
the request status, downloading the data, extracting and post-processing it.

The preload mechanism can be used using
`.preload_data(*data_ids, **preload_params)` on the CLMS data store instance.

The following classes (components) are responsible for this mechanism:

**Clms**

- Serves as the main interface to interact with the CLMS API. This class
coordinates with the `ClmsPreloadHandle` class to preload the data into a
cache data store.

**DownloadTaskManager**

- Handles the download process, including managing download requests and
checking their statuses.
- Retrieves task statuses based on dataset and file IDs or task IDs, determining
whether the download is pending, completed, or cancelled.
- Initiates data downloads in chunks and manages zip file extraction, looking
specifically for geo data. Definition of geo data is defined in the function
docstring in the notes.

**ClmsApiTokenHandler**

- Handles the creation and refreshing of the CLMS API token given the
credentials which can be obtained following the
steps [here](https://eea.github.io/clms-api-docs/authentication.html)

**FileProcessor**

- Handles the processing of downloaded data, extracting, stacking and
storing geo files from downloaded zip files.

**ClmsPreloadHandle**

- The main class responsible for orchestrating the preloading of datasets.
- It coordinates with _DownloadTaskManager_,
_ClmsApiTokenHandler_ and _FileProcessor_ classes to handle the complete
process of caching, data downloading, making sure token is valid and
processing of downloaded data.

## CLMS API

- Requires an EU account to register on the CLMS site.
- Once registered, the user should create an access token json file as
described [here](https://eea.github.io/clms-api-docs/authentication.html)

## CLMS API issues

This API has some problems as listed below

- The datasets which are made available via requests, contain a download link to
a zip file, which is valid only for 3 days. But we found that this is not true
and we cannot rely on this time to make sure that the download link still
works. So, we have to create a workaround to manage our own expiry times. This
issue has been raised with the CLMS service desk. Quoting their reply For the
first issue mentioned by you:
`The status is completed and there is indicated that there are 2 days for expiring, but the download link is already expired, we are going to investigate this bug.`
- We use the API to figure out if a certain data_id has already been requested
to the CLMS server and its status so that we can get the download link
directly or if it has not been requested yet or expired, we request it. But
this is also not possible because although on their web UI, we cannot see the
old downloads that have expired, the API does return the expired requests
which were completed and do not contain any information that they are expired
or when they will expire. Quoting the CLMS helpdesk replies
`For the second issue mentioned by you: the @datarequest_search endpoint does not seem to be working as expected, we are going to consult the API experts so to check its functioning and in case an improvement is feasible in our side, we´ll let you know.`
and its follow up after a week
`After having analysed the possibility to improve the status of the downloads, our team answers the following: Currently, our download system is not able to extract information on whether the link has expired or not, therefore our API does not provide this information.. Due to this, we had to create workarounds to figure out if a certain dataset's link was expired or not.`
- The cancel endpoint for the API does not work and the issue was raised with
the helpdesk team as well. Quoting their reply
`Recently a new firewall of the CLMS Portal machine has been setup. This new firewall is blocking some of the process cancelation request. We've detected the issue and working with the IT team to solve it`.
11 changes: 9 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,15 @@ dependencies:
# Required
- python>=3.10
- xarray
- xcube >= 1.7.0
# for testing
- cryptography
- ipywidgets
- requests
- fsspec
- xcube
b-yogesh marked this conversation as resolved.
Show resolved Hide resolved
# Testing
- numpy
- black
- flake8
- pytest
- pytest-cov
- pytest-recording
Loading
Loading