Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add generic per-band attributes #147

Open
caitlinadams opened this issue Mar 15, 2024 · 4 comments
Open

Ability to add generic per-band attributes #147

caitlinadams opened this issue Mar 15, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@caitlinadams
Copy link
Contributor

The product definition document used by the Open Data Cube (e.g. Sentinel-2 in Digital Earth Africa) may contain additional per-band attributes. In the case of Sentinel-2, the key attribute is the flags_definition for the SCL band, which contains pixel-quality information such as the mapping between SCL values and classes (e.g. '3': cloud shadows). This is critical to add to the returned xarray to enable cloud masking at later stages.

I am not currently able to add the band-level attribute through the configuration dictionary. My workaround is to manually add the desired attribute to the specific band after loading:

ds_stac = odc.stac.load(
    ...,
    bands=(... 'SCL'),
    ...
)

ds_stac["SCL"] = ds_stac.SCL.assign_attrs(flags_definition = {'qa': {'bits': [...], 'values': {...}, 'description': ...}})

I am curious whether:

  • the configuration dictionary handling could be updated to also add any generic attribute for individual bands
  • alternatively, whether there's a way to provide the load with an ODC product definition and have the metadata come from that rather than a user-defined config.

The second idea would be my preference, but I don't have a good feel as to whether that's in scope for this package. My ultimate goal is to make the xarray returned from odc.stac.load for an ODC STAC as similar as possible to the xarray returned from datacube.load for the same ODC data.

@caitlinadams caitlinadams added the enhancement New feature or request label Mar 15, 2024
@Kirill888
Copy link
Member

Thanks @caitlinadams, tangentially related issue opendatacube/odc-geo#117

This won't be a huge addition if we stick with "user-supplied only" interface. I guess config dictionary is the right spot for this, although we have been moving away from relying on this parameter, for example per band dtype and nodata can be configured via a more direct dtype= and nodata= arguments. Maybe attrs: dict[band_name, dict[str, Any]] would fit better to match dtype= behaviour.

@caitlinadams
Copy link
Contributor Author

caitlinadams commented Mar 18, 2024

Thanks @Kirill888 -- I agree that this is related to the odc-geo issue on opinionated masking.

I hadn't realised that odc-stack can handle both dtype and nodata as keyword args. Looking at the source code, my current understanding is that dtype is an explicitly listed kwarg, whereas nodata can be supplied through **kw and accessed later in the resolve_load_cfg function. Is that correct?

I think it's a reasonable suggestion to have a keyword argument for additional attributes, matching the dtype= behaviour. Is this something that you think is worth implementing in the short term? Does it have flow-on effects for the odc-geo opinionated masking conversation? I'm wondering if you would ever make it a requirement for a band to have the flag information as a band attribute to be able to use a generic or opinionated masking function.

@Kirill888
Copy link
Member

Looking at the source code, my current understanding is that dtype is an explicitly listed kwarg, whereas nodata can be supplied through **kw and accessed later in the resolve_load_cfg function. Is that correct?

looks like it, inconsistent for sure, also nodata= only supports "apply to all bands", whereas dtype can be applied per band. This needs to be harmonized. Not sure about reasons for it, probably just "happened" that way.

def resolve_load_cfg(
bands: dict[str, RasterBandMetadata],
resampling: str | dict[str, str] | None = None,
dtype: DTypeLike | dict[str, DTypeLike] | None = None,
use_overviews: bool = True,
nodata: float | None = None,
fail_on_error: bool = True,
) -> dict[str, RasterLoadParams]:

We should probably apply this to most user supplied settings:

  • dtype
  • resampling
  • nodata should be expanded to allow override per band
  • use_overviews probably global setting is ok for this one
  • attrs to be added

@caitlinadams
Copy link
Contributor Author

That sounds good to me, thanks. Once you've implemented this, I would be happy to take a look at updating the docs/examples to make this functionality clearer, and perhaps move away from the config approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants