Skip to content

Use meta correctly in map_blocks to prevent dask from passing a 0d array through the regridder #4598

Closed
@wjbenfold

Description

@wjbenfold

Context

dask.array.map_blocks will under some circumstances pass a 0d array through the provided function when initialised, as documented in https://docs.dask.org/en/latest/generated/dask.array.map_blocks.html
In

iris/lib/iris/_lazy_data.py

Lines 355 to 388 in 4abaa8f

def map_complete_blocks(src, func, dims, out_sizes):
"""Apply a function to complete blocks.
Complete means that the data is not chunked along the chosen dimensions.
Args:
* src (:class:`~iris.cube.Cube`):
Source cube that function is applied to.
* func:
Function to apply.
* dims (tuple of int):
Dimensions that cannot be chunked.
* out_sizes (tuple of int):
Output size of dimensions that cannot be chunked.
"""
if not src.has_lazy_data():
return func(src.data)
data = src.lazy_data()
# Ensure dims are not chunked
in_chunks = list(data.chunks)
for dim in dims:
in_chunks[dim] = src.shape[dim]
data = data.rechunk(in_chunks)
# Determine output chunks
out_chunks = list(data.chunks)
for dim, size in zip(dims, out_sizes):
out_chunks[dim] = size
return data.map_blocks(func, chunks=out_chunks, dtype=src.dtype)
we call map_blocks without meta, including when handing it an area weighted regridding function (and presumably other times) that won't pass through a 0d array. We do the same elsewhere in the same file too.

Issues arising

  • DeprecationWarnings #4574 documents a deprecation warning seen when the Iris tests are run as the 0d array is passed in by dask and then indexed.
  • I don't know if we get performance or safety improvements by adding this in, it's more that we're "doing it properly" / using dask as designed. Seems like a good way to make it easier to understand the codebase though.

Suggestions

  • Work out how to choose what the meta kwarg should be set to, and set it
  • Consider whether the dtype argument should also be provided

Metadata

Metadata

Assignees

No one assigned

    Labels

    StaleA stale issue/pull-request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions