Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chunk grid logic #141

Merged
merged 16 commits into from
Aug 26, 2021
Merged

Add chunk grid logic #141

merged 16 commits into from
Aug 26, 2021

Conversation

rabernat
Copy link
Contributor

This is a first start on closing #140 and #98.

Let's create a new abstract class called ChunkGrid that understands about the mapping between chunk indexes and array slices and implement methods to find the overlap / intersection between different grids. Then we can refactor XarrayZarrRecipe to use this rather than its current impenetrable logic.

This is definitely WIP; just wanted to push something to get the conversation started.

@alxmrs
Copy link
Contributor

alxmrs commented Jul 30, 2021

Hey Ryan! I noticed that this approach seems a lot like XArray-Beams's ChunkKeys (source). I've used them on an internal parallel version of the XarrayToZarr recipe that supports using multiple ConcatDims (it uses XArrayBeam under the hood, but supports the FilePattern interface).

I'd love to talk more about the approach you're taking here, and how we can generalize this pattern.

Copy link
Contributor Author

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the overall effect of this is to shift the reasoning about chunks, overlaps, subsets, etc. into an abstract, standalone class where the logic can be clearly implemented and thoroughly tested.

dim_idx.index
for dim_idx in chunk_key
if dim_idx.operation == CombineOp.SUBSET and dim_idx.name == concat_dim
][0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how we no longer need this opaque logic in this function. It now lives in ChunkGrid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and the new chunk_position function, which translates a ChunkKey to a chunk index)

for conflict_index in conflict:
this_chunk_conflicts.add(conflict_index)

return {concat_dim: region_slice}, this_chunk_conflicts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. All this code goes away.

@rabernat rabernat marked this pull request as ready for review August 21, 2021 02:11
@rabernat rabernat merged commit a23cb82 into pangeo-forge:master Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants