map_blocks

UnitsAwareDataArray.map_blocks(func: Callable[..., T_Xarray], args: Sequence[Any] = (), kwargs: Mapping[str, Any] | None = None, template: DataArray | Dataset | None = None) T_Xarray

Apply a function to each block of this DataArray.

Warning

This method is experimental and its signature may change.

Parameters:
  • func (callable) –

    User-provided function that accepts a DataArray as its first parameter. The function will receive a subset or ‘block’ of this DataArray (see below), corresponding to one chunk along each chunked dimension. func will be executed as func(subset_dataarray, *subset_args, **kwargs).

    This function must return either a single DataArray or a single Dataset.

    This function cannot add a new chunked dimension.

  • args (sequence) – Passed to func after unpacking and subsetting any xarray objects by blocks. xarray objects in args must be aligned with this object, otherwise an error is raised.

  • kwargs (mapping) – Passed verbatim to func after unpacking. xarray objects, if any, will not be subset to blocks. Passing dask collections in kwargs is not allowed.

  • template (DataArray or Dataset, optional) – xarray object representing the final result after compute is called. If not provided, the function will be first run on mocked-up data, that looks like this object but has sizes 0, to determine properties of the returned object such as dtype, variable names, attributes, new dimensions and new indexes (if any). template must be provided if the function changes the size of existing dimensions. When provided, attrs on variables in template are copied over to the result. Any attrs set by func will be ignored.

Returns:

  • A single DataArray or Dataset with dask backend, reassembled from the outputs of the

  • function.

Notes

This function is designed for when func needs to manipulate a whole xarray object subset to each block. Each block is loaded into memory. In the more common case where func can work on numpy arrays, it is recommended to use apply_ufunc.

If none of the variables in this object is backed by dask arrays, calling this function is equivalent to calling func(obj, *args, **kwargs).

See also

dask.array.map_blocks, xarray.apply_ufunc, xarray.Dataset.map_blocks xarray.DataArray.map_blocks

xarray-tutorial:advanced/map_blocks/map_blocks

Advanced Tutorial on map_blocks with dask

Examples

Calculate an anomaly from climatology using .groupby(). Using xr.map_blocks() allows for parallel operations with knowledge of xarray, its indices, and its methods like .groupby().

>>> def calculate_anomaly(da, groupby_type="time.month"):
...     gb = da.groupby(groupby_type)
...     clim = gb.mean(dim="time")
...     return gb - clim
...
>>> time = xr.cftime_range("1990-01", "1992-01", freq="ME")
>>> month = xr.DataArray(time.month, coords={"time": time}, dims=["time"])
>>> np.random.seed(123)
>>> array = xr.DataArray(
...     np.random.rand(len(time)),
...     dims=["time"],
...     coords={"time": time, "month": month},
... ).chunk()
>>> array.map_blocks(calculate_anomaly, template=array).compute()
<xarray.DataArray (time: 24)> Size: 192B
array([ 0.12894847,  0.11323072, -0.0855964 , -0.09334032,  0.26848862,
        0.12382735,  0.22460641,  0.07650108, -0.07673453, -0.22865714,
       -0.19063865,  0.0590131 , -0.12894847, -0.11323072,  0.0855964 ,
        0.09334032, -0.26848862, -0.12382735, -0.22460641, -0.07650108,
        0.07673453,  0.22865714,  0.19063865, -0.0590131 ])
Coordinates:
  * time     (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
    month    (time) int64 192B 1 2 3 4 5 6 7 8 9 10 ... 3 4 5 6 7 8 9 10 11 12

Note that one must explicitly use args=[] and kwargs={} to pass arguments to the function being applied in xr.map_blocks():

>>> array.map_blocks(
...     calculate_anomaly, kwargs={"groupby_type": "time.year"}, template=array
... )  
<xarray.DataArray (time: 24)> Size: 192B
dask.array<<this-array>-calculate_anomaly, shape=(24,), dtype=float64, chunksize=(24,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00
    month    (time) int64 192B dask.array<chunksize=(24,), meta=np.ndarray>