-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new CDSE STAC API and speed up of stacking SEN2 tiles #28
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #28 +/- ##
==========================================
+ Coverage 98.68% 99.31% +0.63%
==========================================
Files 14 13 -1
Lines 985 1029 +44
==========================================
+ Hits 972 1022 +50
+ Misses 13 7 -6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. You removed the sen2
module, but actually I'd prefer we had a separate sen2
module that contains sentinel-2 specific logic.
xcube_stac/_utils.py
Outdated
else: | ||
format_id = MAP_MIME_TYP_FORMAT.get(asset.media_type.split("; ")[0]) | ||
if format_id is None: | ||
LOG.debug(f"No format_id found for asset {asset.title!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, such logging calls can slow down performance if called very often. If this is a quite likely case, I would suggest removing it or use a DEBUG
constant / env var which you as a developer would switch on to allow for debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it.
xcube_stac/_utils.py
Outdated
return float( | ||
item.properties.get( | ||
"processing:version", | ||
item.properties.get("s2:processing_baseline", "1.0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange default as it is S2 specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the defaults it not S2 specific. If no processing version is given, then 1.0 returned. We need a string which can be turned into a float, because the code looks for the highest processing version. I suggest to move it to the stack module.
xcube_stac/constants.py
Outdated
description="Names of assets (bands) which will be included in the data cube.", | ||
description="Names of assets which will be included in the data cube.", | ||
) | ||
SCEMA_APPLY_SCALING = JsonBooleanSchema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCEMA_APPLY_SCALING = JsonBooleanSchema( | |
SCHEMA_APPLY_SCALING = JsonBooleanSchema( |
xcube_stac/constants.py
Outdated
apply_scaling=JsonBooleanSchema( | ||
title="Apply scaling, offset, and no-data values to data" | ||
), | ||
apply_scaling=SCEMA_APPLY_SCALING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply_scaling=SCEMA_APPLY_SCALING, | |
apply_scaling=SCHEMA_APPLY_SCALING, |
xcube_stac/stack.py
Outdated
if "crs" in list_ds[0]: | ||
ds_mosaic["crs"] = list_ds[0].crs | ||
if "spatial_ref" in list_ds[0]: | ||
ds_mosaic.coords["spatial_ref"] = list_ds[0].spatial_ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember that crs
and spatial_ref
are not names that are put in stone. These can vary. The meaning of the grid mapping variable only comes from referring it from other (data) variables using attribute grid_mapping
. You can also detect grid mapping variables by their own required attribute grid_mapping_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already addressed xcube-dev/xcube#1013 for xcube stac. I added a function normalize_grid_mapping
in the _utils
module, which is only called once at the end once.
xcube_stac/stack.py
Outdated
final_slices.append(ds_mosaic) | ||
final_ds = xr.concat(final_slices, dim="time", join="exact") | ||
final_ds = final_ds.assign_coords(coords=dict(time=dts)) | ||
if "crs" in final_ds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above.
for item in grouped_items.sel(time=dt).values.flatten() | ||
if item is not None | ||
] | ||
for dt in grouped_items.time.values | ||
} | ||
) | ||
return ds | ||
|
||
def stack_items( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method's logic is not easy grasp, hence its body would benefit from explaining comments or refactoring out methods with expressive names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
f"A{item.properties["orbitNumber"]:06}_{time_end}/IMG_DATA/" | ||
f"R{res_select}m/T{item.properties["tileId"]}_" | ||
f"{id_parts[2]}_{asset_name}_{res_select}m.jp2" | ||
def list_assets_from_item( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider putting sentinel-2 (/CDSE) related logic into a separate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do so in the next PR
spatial_res=SCHEMA_SPATIAL_RES, | ||
query=SCHEMA_ADDITIONAL_QUERY, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please condier moving Sentinel-2 related stuff into separate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do so in the next PR.
format_ids = self._helper.supported_format_ids | ||
return self._select_opener_id(protocols, format_ids, data_type=data_type) | ||
self._protocols = ["s3"] | ||
self._format_ids = ["zarr", "levels"] | ||
|
||
|
||
class StacCdseDataStore(StacDataStore): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move into separate submodule.
I felt the same regarding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my minor suggestions. Or just run black.
Since we use Python 3.10 consider using |
instead of Union
and optionally other goodies from typing
such TypeAlias
. Next time maybe.
f"{access_params['protocol']}://{access_params['root']}/" | ||
f"{access_params['fs_path']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double quotes are used everywhere else. Run black.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I changed it using |
This PR contains the following points:
rectify_dataset
on a 3D dataset. This reduces the number of reprojection and therefore speeds up the stacking/mosaicking process.