Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new CDSE STAC API and speed up of stacking SEN2 tiles #28

Merged
merged 10 commits into from
Jan 22, 2025

Conversation

konstntokas
Copy link
Collaborator

@konstntokas konstntokas commented Jan 7, 2025

This PR contains the following points:

  • This PR implements the new CDSE STAC API with the endpoint https://stac.dataspace.copernicus.eu/v1
  • Tiles are stacked along the time axis, subsequently reprojected to a common grid, and stitched together (mosaicking) where first value is taken. This procedure can be used after Improve resample_in_space and rectify_dataset xcube#1098, which allows to perform rectify_dataset on a 3D dataset. This reduces the number of reprojection and therefore speeds up the stacking/mosaicking process.

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.31%. Comparing base (5ba6b6f) to head (4b02012).
Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
+ Coverage   98.68%   99.31%   +0.63%     
==========================================
  Files          14       13       -1     
  Lines         985     1029      +44     
==========================================
+ Hits          972     1022      +50     
+ Misses         13        7       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@konstntokas konstntokas requested a review from forman January 10, 2025 15:21
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. You removed the sen2 module, but actually I'd prefer we had a separate sen2 module that contains sentinel-2 specific logic.

else:
format_id = MAP_MIME_TYP_FORMAT.get(asset.media_type.split("; ")[0])
if format_id is None:
LOG.debug(f"No format_id found for asset {asset.title!r}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, such logging calls can slow down performance if called very often. If this is a quite likely case, I would suggest removing it or use a DEBUG constant / env var which you as a developer would switch on to allow for debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it.

return float(
item.properties.get(
"processing:version",
item.properties.get("s2:processing_baseline", "1.0"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange default as it is S2 specific.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the defaults it not S2 specific. If no processing version is given, then 1.0 returned. We need a string which can be turned into a float, because the code looks for the highest processing version. I suggest to move it to the stack module.

description="Names of assets (bands) which will be included in the data cube.",
description="Names of assets which will be included in the data cube.",
)
SCEMA_APPLY_SCALING = JsonBooleanSchema(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SCEMA_APPLY_SCALING = JsonBooleanSchema(
SCHEMA_APPLY_SCALING = JsonBooleanSchema(

apply_scaling=JsonBooleanSchema(
title="Apply scaling, offset, and no-data values to data"
),
apply_scaling=SCEMA_APPLY_SCALING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apply_scaling=SCEMA_APPLY_SCALING,
apply_scaling=SCHEMA_APPLY_SCALING,

Comment on lines 135 to 138
if "crs" in list_ds[0]:
ds_mosaic["crs"] = list_ds[0].crs
if "spatial_ref" in list_ds[0]:
ds_mosaic.coords["spatial_ref"] = list_ds[0].spatial_ref
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that crs and spatial_ref are not names that are put in stone. These can vary. The meaning of the grid mapping variable only comes from referring it from other (data) variables using attribute grid_mapping. You can also detect grid mapping variables by their own required attribute grid_mapping_name.

Copy link
Collaborator Author

@konstntokas konstntokas Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already addressed xcube-dev/xcube#1013 for xcube stac. I added a function normalize_grid_mapping in the _utils module, which is only called once at the end once.

final_slices.append(ds_mosaic)
final_ds = xr.concat(final_slices, dim="time", join="exact")
final_ds = final_ds.assign_coords(coords=dict(time=dts))
if "crs" in final_ds:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

for item in grouped_items.sel(time=dt).values.flatten()
if item is not None
]
for dt in grouped_items.time.values
}
)
return ds

def stack_items(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method's logic is not easy grasp, hence its body would benefit from explaining comments or refactoring out methods with expressive names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

f"A{item.properties["orbitNumber"]:06}_{time_end}/IMG_DATA/"
f"R{res_select}m/T{item.properties["tileId"]}_"
f"{id_parts[2]}_{asset_name}_{res_select}m.jp2"
def list_assets_from_item(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider putting sentinel-2 (/CDSE) related logic into a separate module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do so in the next PR

spatial_res=SCHEMA_SPATIAL_RES,
query=SCHEMA_ADDITIONAL_QUERY,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please condier moving Sentinel-2 related stuff into separate module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do so in the next PR.

format_ids = self._helper.supported_format_ids
return self._select_opener_id(protocols, format_ids, data_type=data_type)
self._protocols = ["s3"]
self._format_ids = ["zarr", "levels"]


class StacCdseDataStore(StacDataStore):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into separate submodule.

@konstntokas
Copy link
Collaborator Author

Looks good overall. You removed the sen2 module, but actually I'd prefer we had a separate sen2 module that contains sentinel-2 specific logic.

I felt the same regarding the sen2 module, after I implemented more Sentinel-2 specific functionalities for reading out the angles. In the next PR there will be a package containing all accessors, and I implemented a module for the https, s3 and sen2 accessor.

@konstntokas konstntokas requested a review from forman January 21, 2025 10:46
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my minor suggestions. Or just run black.
Since we use Python 3.10 consider using | instead of Union and optionally other goodies from typing such TypeAlias. Next time maybe.

Comment on lines +160 to +161
f"{access_params['protocol']}://{access_params['root']}/"
f"{access_params['fs_path']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double quotes are used everywhere else. Run black.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@konstntokas
Copy link
Collaborator Author

Please see my minor suggestions. Or just run black. Since we use Python 3.10 consider using | instead of Union and optionally other goodies from typing such TypeAlias. Next time maybe.

I changed it using pyupgrade, which closes #16

@konstntokas konstntokas merged commit 6f7fed7 into main Jan 22, 2025
3 checks passed
@konstntokas konstntokas deleted the konstntokas-xxx-speed_up_mosaicking branch January 22, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants