Skip to content

Use meta correctly in map_blocks to prevent dask from passing a 0d array through the regridder #4598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wjbenfold opened this issue Feb 23, 2022 · 5 comments
Labels
Stale A stale issue/pull-request

Comments

@wjbenfold
Copy link
Contributor

wjbenfold commented Feb 23, 2022

Context

dask.array.map_blocks will under some circumstances pass a 0d array through the provided function when initialised, as documented in https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html
In

iris/lib/iris/_lazy_data.py

Lines 355 to 388 in 4abaa8f

def map_complete_blocks(src, func, dims, out_sizes):
"""Apply a function to complete blocks.
Complete means that the data is not chunked along the chosen dimensions.
Args:
* src (:class:`~iris.cube.Cube`):
Source cube that function is applied to.
* func:
Function to apply.
* dims (tuple of int):
Dimensions that cannot be chunked.
* out_sizes (tuple of int):
Output size of dimensions that cannot be chunked.
"""
if not src.has_lazy_data():
return func(src.data)
data = src.lazy_data()
# Ensure dims are not chunked
in_chunks = list(data.chunks)
for dim in dims:
in_chunks[dim] = src.shape[dim]
data = data.rechunk(in_chunks)
# Determine output chunks
out_chunks = list(data.chunks)
for dim, size in zip(dims, out_sizes):
out_chunks[dim] = size
return data.map_blocks(func, chunks=out_chunks, dtype=src.dtype)
we call map_blocks without meta, including when handing it an area weighted regridding function (and presumably other times) that won't pass through a 0d array. We do the same elsewhere in the same file too.

Issues arising

  • DeprecationWarnings #4574 documents a deprecation warning seen when the Iris tests are run as the 0d array is passed in by dask and then indexed.
  • I don't know if we get performance or safety improvements by adding this in, it's more that we're "doing it properly" / using dask as designed. Seems like a good way to make it easier to understand the codebase though.

Suggestions

  • Work out how to choose what the meta kwarg should be set to, and set it
  • Consider whether the dtype argument should also be provided
@rcomer rcomer mentioned this issue Feb 24, 2022
2 tasks
@wjbenfold wjbenfold self-assigned this Mar 4, 2022
@wjbenfold
Copy link
Contributor Author

Turns out I was wrong the other day when I told @trexfeathers that all this needed was working out, and that it didn't need deciding first...

Investigation

map_complete_blocks is called twice within Iris:

_area_weighted

new_data = map_complete_blocks(
src_cube, regrid, (src_y_dim, src_x_dim), meshgrid_x.shape
)

The return dtype is set within area weighted regridding using np.promote_types(src_data.dtype, np.float16). We could:

  • have _regrid_area_weighted_rectilinear_src_and_grid__perform calculate the dtype it wants and pass this as an argument to _regrid_area_weighted_array.
  • tightly couple _regrid_area_weighted_rectilinear_src_and_grid__perform and _regrid_area_weighted_array by computing the promotion twice, and then passing it as a kwarg to map_complete_blocks.
  • have _regrid_area_weighted_array catch a 0d input array as obviously from dask and return the correct type of array to satisfy it.

_regrid

data = map_complete_blocks(
src, regrid, (y_dim, x_dim), sample_grid_x.shape
)

The return dtype of _regrid is found through a similar promotion method, though with more caveats, and looks like it can change depending on what the _RegularGridInterpolator does. This gives us similar options:

  • have RectilinearRegridder.__call__ calculate the dtype it wants and pass this as an argument to _regrid.
  • couple RectilinearRegridder.__call__ and _regrid by computing the dtype twice.
  • have _regrid catch a 0d input array as obviously from dask and return the correct type of array to satisfy it.

N.B. Where I've referred to "dtype" above, we'd also have to think about the type of the array (whether it's masked etc.) too, but that's easier as I think it's just whatever the source data was.

Question

We should make the same choice in both places, and I don't think the coupling two functions choice is a good one so do we:

  1. Pre-choose the dtype in the functions that call map_complete_blocks and have them pass a meta argument to map_complete_blocks (which can then pass it to map_blocks?
  2. Let map_blocks throw a 0d array through the regridders, and spot it in there then return a 0d array of the right type?

Copy link
Contributor

github-actions bot commented Nov 4, 2023

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Nov 4, 2023
Copy link
Contributor

github-actions bot commented Dec 2, 2023

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

  • Re-open this issue, if you have sufficient permissions, or
  • Add a comment stating that this is still relevant and someone will re-open it on your behalf.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 2, 2023
@trexfeathers trexfeathers removed the Stale A stale issue/pull-request label Dec 6, 2023
@trexfeathers trexfeathers reopened this Dec 6, 2023
@scitools-ci scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023
Copy link
Contributor

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Apr 20, 2025
@bjlittle
Copy link
Member

Closed by #5989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale A stale issue/pull-request
Projects
Status: Done
Development

No branches or pull requests

3 participants