Skip to content

implement interp() #2104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 56 commits into from
Jun 8, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
91e6723
Start working
fujiisoup Apr 27, 2018
db89669
interp1d for numpy backed array.
fujiisoup May 3, 2018
921ecdc
interp1d for dask backed array.
fujiisoup May 3, 2018
6b198bd
Support scalar interpolation.
fujiisoup May 4, 2018
c4961b0
more docs
fujiisoup May 4, 2018
14404c9
flake8. Remove an unnecessary file.
fujiisoup May 4, 2018
78144e9
Remove non-unicode characters
fujiisoup May 4, 2018
7004f75
Merge branch 'master' into interp_at
fujiisoup May 5, 2018
b1360ee
refactoring...
fujiisoup May 7, 2018
642e6b3
flake8. whats new
fujiisoup May 7, 2018
3328128
Make tests skip if scipy is not installed
fujiisoup May 7, 2018
dfc347e
skipif -> skip
fujiisoup May 7, 2018
3284ad2
move skip into every function
fujiisoup May 7, 2018
c19e9dd
remove reuires_scipy
fujiisoup May 7, 2018
39a0005
refactoring exceptions.
fujiisoup May 7, 2018
6c77873
assert_equal -> assert_allclose
fujiisoup May 8, 2018
4ff8477
Remove unintended word.
fujiisoup May 8, 2018
0807652
More tests. More docs.
fujiisoup May 8, 2018
230aada
More docs.
fujiisoup May 8, 2018
0a4a196
Added a benchmark
fujiisoup May 8, 2018
359412a
doc. Remove *.png file.
fujiisoup May 9, 2018
281dc7f
add .load to benchmark with dask.
fujiisoup May 9, 2018
03ed045
add assume_sorted kwarg.
fujiisoup May 10, 2018
2530b24
Support dimension without coordinate
fujiisoup May 10, 2018
01243f1
flake8
fujiisoup May 10, 2018
b3c76d7
More docs. test for attrs.
fujiisoup May 10, 2018
7cfa56b
Merge branch 'master' into interp_at
fujiisoup May 11, 2018
82e04c5
Merge branch 'master' into interp_at
fujiisoup May 12, 2018
8c29a4b
Updates based on comments
fujiisoup May 17, 2018
d89a1bb
rename test
fujiisoup May 17, 2018
ed718d9
update docs
fujiisoup May 17, 2018
aec3bbc
Add transpose for python 2
fujiisoup May 17, 2018
0f17044
More strict ordering
fujiisoup May 17, 2018
d361508
Cleanup
fujiisoup May 19, 2018
05b4c8f
Update doc
fujiisoup May 19, 2018
d8ca99f
Add skipif in tests
fujiisoup May 20, 2018
7cf370f
Merge branch 'master' into interp_at
fujiisoup May 22, 2018
c0d796a
minor grammar/language edits in docs
shoyer May 25, 2018
7ab6eec
Merge branch 'master' into interp_at
fujiisoup May 29, 2018
f9a819a
Support dict arguments for interp.
fujiisoup May 29, 2018
21d4390
update based on comments
fujiisoup May 30, 2018
6b8f05e
Remove unused if-block
fujiisoup May 30, 2018
58b4c13
ValueError -> NotImpletedError. Doc improvement
fujiisoup May 30, 2018
63aa0b3
Using OrderedSet
fujiisoup May 31, 2018
92c4d27
Merge branch 'master' into interp_at
shoyer Jun 1, 2018
193bb88
Merge branch 'master' into interp_at
fujiisoup Jun 1, 2018
b671257
Drop object array after interpolation.
fujiisoup Jun 4, 2018
cf9351b
Merge remote-tracking branch 'origin/interp_at' into interp_at
fujiisoup Jun 4, 2018
f2dc499
Merge branch 'master' into interp_at
fujiisoup Jun 4, 2018
6e00999
flake8
fujiisoup Jun 4, 2018
91d92f6
Add keep_attrs keyword
fujiisoup Jun 6, 2018
86a3823
flake8 (reverted from commit 6e0099963a50dc622204a690a0058b4db527b8ef)
fujiisoup Jun 6, 2018
9512d13
flake8
fujiisoup Jun 7, 2018
4df36da
Remove keep_attrs keywords
fujiisoup Jun 7, 2018
ec8e709
Returns copy for not-interpolated variable.
fujiisoup Jun 7, 2018
60e2ca3
Fix docs
fujiisoup Jun 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
import numpy as np
import pandas as pd

from . import computation, groupby, indexing, ops, resample, rolling, utils
from . import (computation, dtypes, groupby, indexing, ops, resample, rolling,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '.dtypes' imported but unused

utils)
from ..plot.plot import _PlotMethods
from .accessors import DatetimeAccessor
from .alignment import align, reindex_like_indexers
Expand Down Expand Up @@ -882,6 +883,36 @@ def reindex(self, method=None, tolerance=None, copy=True, **indexers):
method=method, tolerance=tolerance, copy=copy, **indexers)
return self._from_temp_dataset(ds)

def interpolate_at(self, method='linear', fill_value=np.nan, kwargs={},
**coords):
""" Multidimensional interpolation of variables.

Parameters
----------
**coords : {dim: new_coordinate, ...}
Keyword arguments with names matching dimensions and values.
coords can be a integer, array-like or DataArray.
If DataArrays are passed as coords, xarray-style indexing will be
carried out.
method: {'linear', 'RectBivariateSpline', 'NdPPoly'} for
multidimensional array,
{'linear', 'barycentric', 'krogh', 'pchip', 'akima',
'ppoly', 'bpoly'} for 1-dimensional array.

Returns
-------
interpolated: xr.Dataset
New dataset on the new coordinates.

Note
----
scipy is required. If NaN is in the array, ValueError will be raised.
"""
ds = self._to_temp_dataset().interpolate_at(
method=method, fill_value=fill_value, kwargs=kwargs, **coords)
return self._from_temp_dataset(ds)


def rename(self, new_name_or_name_dict):
"""Returns a new DataArray with renamed coordinates or a new name.

Expand Down
132 changes: 130 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
import xarray as xr

from . import (
alignment, duck_array_ops, formatting, groupby, indexing, ops, resample,
rolling, utils)
alignment, dtypes, duck_array_ops, formatting, groupby, indexing, ops,
resample, rolling, utils)
from .. import conventions
from .alignment import align
from .common import DataWithCoords, ImplementsDatasetReduce
Expand Down Expand Up @@ -1312,6 +1312,14 @@ def _validate_indexers(self, indexers):
raise TypeError('cannot use a Dataset as an indexer')
else:
v = np.asarray(v)
if v.ndim == 0:
v = as_variable(v)
elif v.ndim == 1:
v = as_variable((k, v))
else:
raise IndexError(
"Unlabeled multi-dimensional array cannot be "
"used for indexing: {}".format(k))
indexers_list.append((k, v))
return indexers_list

Expand All @@ -1322,6 +1330,9 @@ def _get_indexers_coordinates(self, indexers):

Only coordinate with a name different from any of self.variables will
be attached.

If remove_dimensional_coord is True, the dimensional coordinate of
indexers will be removed.
"""
from .dataarray import DataArray

Expand Down Expand Up @@ -1775,6 +1786,123 @@ def reindex(self, indexers=None, method=None, tolerance=None, copy=True,
coord_names.update(indexers)
return self._replace_vars_and_dims(variables, coord_names)

def interpolate_at(self, method='linear', fill_value=np.nan, kwargs={},
**coords):
""" Multidimensional interpolation of Dataset.

Parameters
----------
**coords : {dim: new_coordinate, ...}
Keyword arguments with names matching dimensions and values.
coords can be an integer, array-like or DataArray.
If DataArrays are passed as coords, their dimensions are used
for the broadcasting.
method: {'linear', 'nearest'} for multidimensional array,
{‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’ }
for 1-dimensional array.

Returns
-------
interpolated: xr.Dataset
New dataset on the new coordinates.

Note
----
scipy is required. If NaN is in the array, ValueError will be raised.

See Also
--------
scipy.interpolate.interp1d
scipy.interpolate.RegularGridInterpolator

Examples
--------
>>> da = xr.DataArray([0, 0.1, 0.2, 0.1], dims='x',
>>> coords={'x': [0, 1, 2, 3]})
>>>
>>> da.interpolate_at(x=[0.5, 1.5]) # simple linear interpolation
<xarray.DataArray (x: 2)>
array([0.05, 0.15])
Coordinates:
* x (x) float64 0.5 1.5
>>>
>>> # with cubic spline interpolation
... da.interpolate_at(x=[0.5, 1.5], method='cubic')
<xarray.DataArray (x: 2)>
array([0.0375, 0.1625])
Coordinates:
* x (x) float64 0.5 1.5
>>>
>>> # interpolation at one single position
... da.interpolate_at(x=0.5)
<xarray.DataArray ()>
array(0.05)
Coordinates:
x float64 0.5
>>>
>>> # interpolation with broadcasting
... da.interpolate_at(x=xr.DataArray([[0.5, 1.0], [1.5, 2.0]],
... dims=['y', 'z']))
<xarray.DataArray (y: 2, z: 2)>
array([[0.05, 0.1 ],
[0.15, 0.2 ]])
Coordinates:
x (y, z) float64 0.5 1.0 1.5 2.0
Dimensions without coordinates: y, z
>>>
>>> da = xr.DataArray([[0, 0.1, 0.2], [1.0, 1.1, 1.2]],
... dims=['x', 'y'],
... coords={'x': [0, 1], 'y': [0, 10, 20]})
>>>
>>> # multidimensional interpolation
... da.interpolate_at(x=[0.5, 1.5], y=[5, 15])
<xarray.DataArray (x: 2, y: 2)>
array([[0.55, 0.65],
[ nan, nan]])
Coordinates:
* x (x) float64 0.5 1.5
* y (y) int64 5 15
>>>
>>> # multidimensional interpolation with broadcasting
... da.interpolate_at(x=xr.DataArray([0.5, 1.5], dims='z'),
... y=xr.DataArray([5, 15], dims='z'))
<xarray.DataArray (z: 2)>
array([0.55, nan])
Coordinates:
x (z) float64 0.5 1.5
y (z) int64 5 15
Dimensions without coordinates: z
"""
from . import interp

indexers_list = self._validate_indexers(coords)

variables = OrderedDict()
for name, var in iteritems(self._variables):
var_indexers = {k: (self._variables[k], v) for k, v
in indexers_list if k in var.dims}
if name not in [k for k, v in indexers_list]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you already need indexers_list as a dict down below, why not do so before this loop so you can use if name not in indexers_dict instead of the comprehension? (The comprehension inside the if clause looks ugly to me.)

if duck_array_ops.count(var.data) != var.size:
raise ValueError(
'intarpolate_at can not be used for an array with '
'nan. {} has {} nans.'.format(
name, var.count() - var.size))
variables[name] = interp.interpolate(
var, var_indexers, method, fill_value, kwargs)

coord_names = set(variables).intersection(self._coord_names)
selected = self._replace_vars_and_dims(variables,
coord_names=coord_names)
# attach indexer as coordinate
variables.update({k: v for k, v in indexers_list})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that dict(indexers_list) is equivalent to {k: v for k, v in indexers_list}.

If you call it inside update, you don't even need to call dict() on it.

# Extract coordinates from indexers
coord_vars = selected._get_indexers_coordinates(coords)
variables.update(coord_vars)
coord_names = (set(variables)
.intersection(self._coord_names)
.union(coord_vars))
return self._replace_vars_and_dims(variables, coord_names=coord_names)

def rename(self, name_dict, inplace=False):
"""Returns a new object with renamed variables and dimensions.

Expand Down
152 changes: 152 additions & 0 deletions xarray/core/interp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
from __future__ import absolute_import, division, print_function
from functools import partial

import numpy as np
from .computation import apply_ufunc
from .pycompat import (OrderedDict, dask_array_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '.pycompat.OrderedDict' imported but unused

from .variable import broadcast_variables


def _localize(obj, index_coord):
""" Speed up for linear and nearest neighbor method.
Only consider a subspace that is needed for the interpolation
"""
for dim, [x, new_x] in index_coord.items():
try:
imin = x.to_index().get_loc(np.min(new_x), method='ffill')
imax = x.to_index().get_loc(np.max(new_x), method='bfill')

idx = slice(np.maximum(imin-1, 0), imax+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

index_coord[dim] = (x[idx], new_x)
obj = obj.isel(**{dim: idx})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only a small portion of arrays will be used for the interpolation with method='linear' / 'nearest '. So if there are no NaN in this region, we can interpolate this array.

except:
pass
return obj, index_coord


def interpolate(obj, indexes_coords, method, fill_value, kwargs):
""" Make an interpolation of Variable

Parameters
----------
obj: Variable
index_coord:
mapping from dimension name to a pair of original and new coordinates.
method: string
One of {'linear', 'nearest', 'zero', 'slinear', 'quadratic',
'cubic'}. For multidimensional interpolation, only
{'linear', 'nearest'} can be used.
fill_value:
fill value for extrapolation
kwargs:
keyword arguments to be passed to scipy.interpolate

Returns
-------
Interpolated Variable
"""
try:
import scipy.interpolate
except ImportError:
raise ImportError(
'Interpolation with method `%s` requires scipy' % method)

if len(indexes_coords) == 0:
return obj

# simple speed up for the local interpolation
if method in ['linear', 'nearest']:
obj, indexes_coords = _localize(obj, indexes_coords)

# target dimensions
dims = list(indexes_coords)
x = [indexes_coords[d][0] for d in dims]
new_x = [indexes_coords[d][1] for d in dims]
destination = broadcast_variables(*new_x)

if len(indexes_coords) == 1:
if method in ['linear', 'nearest', 'zero', 'slinear', 'quadratic',
'cubic']:
func = partial(scipy.interpolate.interp1d, kind=method, axis=-1,
bounds_error=False, fill_value=fill_value)
else:
raise NotImplementedError

rslt = apply_ufunc(_interpolate_1d, obj,
input_core_dims=[dims],
output_core_dims=[destination[0].dims],
output_dtypes=[obj.dtype], dask='allowed',
kwargs={'x': x, 'new_x': destination, 'func': func},
keep_attrs=True)
else:
if method in ['linear', 'nearest']:
func = partial(scipy.interpolate.RegularGridInterpolator,
method=method, bounds_error=False,
fill_value=fill_value)
else:
raise NotImplementedError

rslt = apply_ufunc(_interpolate_nd, obj,
input_core_dims=[dims],
output_core_dims=[destination[0].dims],
output_dtypes=[obj.dtype], dask='allowed',
kwargs={'x': x, 'new_x': destination, 'func': func},
keep_attrs=True)
if all(x1.dims == new_x1.dims for x1, new_x1 in zip(x, new_x)):
return rslt.transpose(*obj.dims)
return rslt


def _interpolate_1d(obj, x, new_x, func):
if isinstance(obj, dask_array_type):
import dask.array as da

_assert_single_chunks(obj, [-1])
chunks = obj.chunks[:-len(x)] + new_x[0].shape
drop_axis = range(obj.ndim-len(x), obj.ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

new_axis = range(obj.ndim-len(x), obj.ndim-len(x)+new_x[0].ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

# call this function recursively
return da.map_blocks(_interpolate_1d, obj, x, new_x, func,
dtype=obj.dtype, chunks=chunks,
new_axis=new_axis, drop_axis=drop_axis)

# x, new_x are tuples of size 1.
x, new_x = x[0], new_x[0]
rslt = func(x, obj)(np.ravel(new_x))
if new_x.ndim > 1:
return rslt.reshape(obj.shape[:-1] + new_x.shape)
if new_x.ndim == 0:
return rslt[..., -1]
return rslt


def _interpolate_nd(obj, x, new_x, func):
""" dask compatible interpolation function.
The last len(x) dimensions are used for the interpolation
"""
if isinstance(obj, dask_array_type):
import dask.array as da

_assert_single_chunks(obj, range(-len(x), 0))
chunks = obj.chunks[:-len(x)] + new_x[0].shape
drop_axis = range(obj.ndim-len(x), obj.ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

new_axis = range(obj.ndim-len(x), obj.ndim-len(x)+new_x[0].ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

return da.map_blocks(_interpolate_nd, obj, x, new_x, func,
dtype=obj.dtype, chunks=chunks,
new_axis=new_axis, drop_axis=drop_axis)

# move the interpolation axes to the start position
obj = obj.transpose(range(-len(x), obj.ndim - len(x)))
# stack new_x to 1 vector, with reshape
xi = np.stack([x1.values.ravel() for x1 in new_x], axis=-1)
rslt = func(x, obj)(xi)
# move back the interpolation axes to the last position
rslt = rslt.transpose(range(-rslt.ndim+1, 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E226 missing whitespace around arithmetic operator

return rslt.reshape(rslt.shape[:-1] + new_x[0].shape)


def _assert_single_chunks(obj, axes):
for axis in axes:
if len(obj.chunks[axis]) > 1:
raise ValueError('Chunk along the dimension to be interpolated '
'({}) is not allowed.'.format(axis))
Loading