Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to TiledDataset for missing, irregular or overlapping tiles #487

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

SolarDrew
Copy link
Contributor

Changes the array of datasets underlying TiledDataset into a masked array, defaulting to False everywhere (all tiles are valid).

Copy link

codspeed-hq bot commented Jan 14, 2025

CodSpeed Performance Report

Merging #487 will not alter performance

Comparing SolarDrew:tiledds-improvements (c8ef224) with main (d9dcb3b)

Summary

✅ 10 untouched benchmarks
🆕 2 new benchmarks
⁉️ 1 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
⁉️ test_tileddataset_repr 1.7 ms N/A N/A
🆕 test_tileddataset_repr[simple-masked] N/A 1.8 ms N/A
🆕 test_tileddataset_repr[simple-nomask] N/A 1.8 ms N/A

@Cadair Cadair marked this pull request as draft January 14, 2025 13:02
def __init__(self, dataset_array, inventory=None):
self._data = np.array(dataset_array, dtype=object)
def __init__(self, dataset_array, inventory=None, mask=False):
self._data = np.ma.masked_array(dataset_array, dtype=object, mask=mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we auto-generate the mask based on the elements of dataset_array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect so. We can readily assume anything that isn't a Dataset should be masked out, but we should probably also have some way of flagging invalid Datasets for masking as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the default value of mask be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works fine. False is specifying to set the mask to be False (ie: not masked) everywhere. Which is the default if you pass None into masked_array anyway, but explicit is better than implicit.

@SolarDrew SolarDrew marked this pull request as ready for review January 21, 2025 16:24
@SolarDrew
Copy link
Contributor Author

There are a few places where this can still be improved - if we're happy to make issues for those and punt them then this is good to go pending review. Otherwise I'll come back to it on Monday.

Copy link
Member

@Cadair Cadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good.

I think we should postpone merging this until we have made the corresponding changes to dkist-inventory just to make sure we haven't missed anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants