map_blocks
- UnitsAwareDataArray.map_blocks(func: Callable[..., T_Xarray], args: Sequence[Any] = (), kwargs: Mapping[str, Any] | None = None, template: DataArray | Dataset | None = None) T_Xarray
Apply a function to each block of this DataArray.
Warning
This method is experimental and its signature may change.
- Parameters:
func (callable) –
User-provided function that accepts a DataArray as its first parameter. The function will receive a subset or ‘block’ of this DataArray (see below), corresponding to one chunk along each chunked dimension.
func
will be executed asfunc(subset_dataarray, *subset_args, **kwargs)
.This function must return either a single DataArray or a single Dataset.
This function cannot add a new chunked dimension.
args (sequence) – Passed to func after unpacking and subsetting any xarray objects by blocks. xarray objects in args must be aligned with this object, otherwise an error is raised.
kwargs (mapping) – Passed verbatim to func after unpacking. xarray objects, if any, will not be subset to blocks. Passing dask collections in kwargs is not allowed.
template (DataArray or Dataset, optional) – xarray object representing the final result after compute is called. If not provided, the function will be first run on mocked-up data, that looks like this object but has sizes 0, to determine properties of the returned object such as dtype, variable names, attributes, new dimensions and new indexes (if any).
template
must be provided if the function changes the size of existing dimensions. When provided,attrs
on variables in template are copied over to the result. Anyattrs
set byfunc
will be ignored.
- Returns:
A single DataArray or Dataset with dask backend, reassembled from the outputs of the
function.
Notes
This function is designed for when
func
needs to manipulate a whole xarray object subset to each block. Each block is loaded into memory. In the more common case wherefunc
can work on numpy arrays, it is recommended to useapply_ufunc
.If none of the variables in this object is backed by dask arrays, calling this function is equivalent to calling
func(obj, *args, **kwargs)
.See also
dask.array.map_blocks, xarray.apply_ufunc, xarray.Dataset.map_blocks xarray.DataArray.map_blocks
- xarray-tutorial:advanced/map_blocks/map_blocks
Advanced Tutorial on map_blocks with dask
Examples
Calculate an anomaly from climatology using
.groupby()
. Usingxr.map_blocks()
allows for parallel operations with knowledge ofxarray
, its indices, and its methods like.groupby()
.>>> def calculate_anomaly(da, groupby_type="time.month"): ... gb = da.groupby(groupby_type) ... clim = gb.mean(dim="time") ... return gb - clim ... >>> time = xr.cftime_range("1990-01", "1992-01", freq="ME") >>> month = xr.DataArray(time.month, coords={"time": time}, dims=["time"]) >>> np.random.seed(123) >>> array = xr.DataArray( ... np.random.rand(len(time)), ... dims=["time"], ... coords={"time": time, "month": month}, ... ).chunk() >>> array.map_blocks(calculate_anomaly, template=array).compute() <xarray.DataArray (time: 24)> Size: 192B array([ 0.12894847, 0.11323072, -0.0855964 , -0.09334032, 0.26848862, 0.12382735, 0.22460641, 0.07650108, -0.07673453, -0.22865714, -0.19063865, 0.0590131 , -0.12894847, -0.11323072, 0.0855964 , 0.09334032, -0.26848862, -0.12382735, -0.22460641, -0.07650108, 0.07673453, 0.22865714, 0.19063865, -0.0590131 ]) Coordinates: * time (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00 month (time) int64 192B 1 2 3 4 5 6 7 8 9 10 ... 3 4 5 6 7 8 9 10 11 12
Note that one must explicitly use
args=[]
andkwargs={}
to pass arguments to the function being applied inxr.map_blocks()
:>>> array.map_blocks( ... calculate_anomaly, kwargs={"groupby_type": "time.year"}, template=array ... ) <xarray.DataArray (time: 24)> Size: 192B dask.array<<this-array>-calculate_anomaly, shape=(24,), dtype=float64, chunksize=(24,), chunktype=numpy.ndarray> Coordinates: * time (time) object 192B 1990-01-31 00:00:00 ... 1991-12-31 00:00:00 month (time) int64 192B dask.array<chunksize=(24,), meta=np.ndarray>