-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
implement interp() #2104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement interp() #2104
Conversation
xarray/core/dataarray.py
Outdated
@@ -6,7 +6,8 @@ | |||
import numpy as np | |||
import pandas as pd | |||
|
|||
from . import computation, groupby, indexing, ops, resample, rolling, utils | |||
from . import (computation, dtypes, groupby, indexing, ops, resample, rolling, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 '.dtypes' imported but unused
xarray/core/interp.py
Outdated
|
||
import numpy as np | ||
from .computation import apply_ufunc | ||
from .pycompat import (OrderedDict, dask_array_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 '.pycompat.OrderedDict' imported but unused
xarray/core/interp.py
Outdated
imin = x.to_index().get_loc(np.min(new_x), method='ffill') | ||
imax = x.to_index().get_loc(np.max(new_x), method='bfill') | ||
|
||
idx = slice(np.maximum(imin-1, 0), imax+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
xarray/core/interp.py
Outdated
|
||
_assert_single_chunks(obj, [-1]) | ||
chunks = obj.chunks[:-len(x)] + new_x[0].shape | ||
drop_axis = range(obj.ndim-len(x), obj.ndim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
xarray/core/interp.py
Outdated
_assert_single_chunks(obj, [-1]) | ||
chunks = obj.chunks[:-len(x)] + new_x[0].shape | ||
drop_axis = range(obj.ndim-len(x), obj.ndim) | ||
new_axis = range(obj.ndim-len(x), obj.ndim-len(x)+new_x[0].ndim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
xarray/interp.py
Outdated
array: np.ndarray or da.ndarray. Should not contain nan. | ||
old_coords, new_coords: list of np.ndarrays. | ||
""" | ||
from scipy import interpolate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'scipy.interpolate' imported but unused
xarray/interp.py
Outdated
|
||
|
||
|
||
def ghosted_interp_func(array, old_coords, new_coords, method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E303 too many blank lines (3)
xarray/tests/test_interp.py
Outdated
x = np.linspace(0, 1, 100) | ||
y = np.linspace(0, 0.1, 30) | ||
data = xr.DataArray( | ||
np.sin(x[:, np.newaxis])*np.cos(y), dims=['x', 'y'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
xarray/tests/test_interp.py
Outdated
y = np.linspace(0, 0.1, 30) | ||
z = np.linspace(0.1, 0.2, 10) | ||
return xr.DataArray( | ||
np.sin(x[:, np.newaxis, np.newaxis])*np.cos(y[:, np.newaxis])*z, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
xarray/tests/test_interp.py
Outdated
|
||
if dim == 'y' and case == 1: | ||
with pytest.raises(ValueError): | ||
actual = da.interpolate_at(**{dim: xdest}, method=method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E999 SyntaxError: invalid syntax
@fujiisoup this is awesome! So glad you are looking into this. To point 1: I think that your implementation is a very good first step. From the geosciences point of view I think that Spline interpolation would be a very good candidate too, but it only works in the 2-dimensional case which makes it a no-go for the general case... |
@fujiisoup - This is great. I wonder if we can draw more from the interpolator adapters I built in Lines 284 to 323 in b9f40cc
I just gave your implementation a quick skim and I think there is currently some duplication of efforts so maybe we look there to see what can be combined. |
First of all, this is awesome! One question: since I think this new interpolation method will be used quite frequently (more often than We should also consider adding
Yes, I agree. Interpolation should be vectorized similarly to In particular, in the long term I think we should aim to make
I think this is fine to start (we can always add more later).
Which NaN are you referring to?
Case (1) should definitely be supported. Especially if data is stored in dask arrays, we cannot necessarily check if there are NaNs. Cases (2) and (3) are not important, because there are relatively few use-cases for NaN in coordinate arrays.
Currently we don't support this for
I think the new coordinates should take priority, and the dimension coordinates on the new coordinate should be dropped. This is similar to what we do for
|
Agreed. I will try to generalize your interpolator-adaptor more.
Agreed.
I meant 1.
|
xarray/core/interp.py
Outdated
|
||
idx = slice(np.maximum(imin - 1, 0), imax + 1) | ||
index_coord[dim] = (x[idx], new_x) | ||
obj = obj.isel(**{dim: idx}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only a small portion of arrays will be used for the interpolation with method='linear' / 'nearest '
. So if there are no NaN in this region, we can interpolate this array.
Maybe the possible API would be da.interp(x=[0, 1, 2], skipna=True) for nan-skipping interpolation and da.interp(x=[0, 1, 2], skipna=False) for faster interpolation. |
I still need to look at the code here, but I don't think we need separate code paths for NaN vs no-NaNs. For example, suppose we have a DataArray with a coordinate |
@shoyer , In [1]: x = np.arange(4)
...: y = np.array([0, 1, np.nan, 3])
...:
In [2]: # 'linear' works as expected
...: interp1d(x, y, kind='linear')([0.5, 1.5])
...:
Out[2]: array([0.5, nan])
In [3]: # 'cubic' returns an all nan array
...: interp1d(x, y, kind='cubic')([0.5, 1.5])
...:
Out[3]: array([nan, nan]) |
Why don't leave the NaN handling part to the actual function doing the interpolation? RegularGridInterpolator handles NaNs pretty much the same way @shoyer describes it. |
I did not certainly realize how RegularGridInterpolator works, but it seems to work with nan including array too as you pointed out.
It is the original behavior of I think the addition of the linear interpolation is still nice, but it would be nicer if we could also implement nan-skipping interpolations, which will enable us to use higher order interpolation with nan-including arrays. |
Do you mean you want to use unstructured data interpolation routines? (https://docs.scipy.org/doc/scipy/reference/interpolate.html#multivariate-interpolation) |
Yes, I am thinking this for multidimensional interpolation of nan-including arrays. Is |
My inclination would be to limit this method to interpolation on regular grids, i.e., the methods supported by scipy.interpolate.interpn or interp1d. I think a |
xarray/core/missing.py
Outdated
if isinstance(obj, dask_array_type): | ||
import dask.array as da | ||
|
||
_assert_single_chunk(obj, range(obj.ndim-len(x), obj.ndim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E226 missing whitespace around arithmetic operator
|
||
import xarray as xr | ||
from xarray.tests import assert_allclose, assert_equal, requires_scipy | ||
from . import has_dask, has_scipy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 '.has_scipy' imported but unused
Done! But, I am slightly worrying if silently dropping non-numeric array is surprising. Or maybe we can raise a warning? |
We do the same thing for aggregations like We could add a |
Thanks, @shoyer
Thanks. I have not been surprised when Maybe this PR is ready, then. BTW, I noticed that our |
Hmm. I guess numeric_only may currently only be an argument for the reduce method. We probably should add it for methods like mean, too. |
xarray/core/dataset.py
Outdated
@@ -1815,24 +1815,29 @@ def reindex(self, indexers=None, method=None, tolerance=None, copy=True, | |||
return self._replace_vars_and_dims(variables, coord_names) | |||
|
|||
def interp(self, coords=None, method='linear', assume_sorted=False, | |||
kwargs={}, **coords_kwargs): | |||
keep_attrs=False, kwargs={}, **coords_kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination would be to have keep_attrs=True
by default, and maybe not even bother to include the argument at all. This would be consistent with reindex
/sel
/isel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then, I would remove this argument.
Do you want to add |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last thing to do here is remove keep_attrs
?
xarray/core/missing.py
Outdated
if len(var.chunks[axis]) > 1 or var.chunks[axis][0] < var.shape[axis]: | ||
raise NotImplementedError( | ||
'Chunking along the dimension to be interpolated ' | ||
'({}) is not supported.'.format(axis)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not supported" -> "not yet supported"
xarray/core/missing.py
Outdated
scipy.interpolate.interp1d | ||
""" | ||
if not x: | ||
return var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency on views/copies, it's better to return var.copy()
here
xarray/core/missing.py
Outdated
kwargs.update(method=method) | ||
interp_class = interpolate.interpn | ||
else: | ||
raise ValueError('%s is not a valid interpolator' % method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add: "for interpolating over multiple dimensions"
Thanks, @shoyer.
Done! |
OK, let's merge this when CI passes. |
Merged! Thanks, @shoyer, @fmaussion, @jhamman, for the detailed review. |
Congratulations on this great new feature! |
@fujiisoup Great feature! thanks. |
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)I started working to add
interpolate_at
to xarray, as discussed in issue #2079 (but without caching).I think I need to take care of more edge cases, but before finishing up this PR, I want to discuss what the best API is.
I would like to this method working similar to
isel
, which may support vectorized interpolation.Currently, this works as follwos
Design question
How many interpolate methods should we support?
Currently, I only implemented scipy.interpolate.interp1d for 1dimensional interpolation and scipy.interpolate.RegularGridInterpolator for multidimensional interpolation.
I think 90% usecases are linear, but there are more methods in scipy.
How do we handle
nan
?Currently this raises ValueError if nan is present. It may be possible to carry out the interpolation with skipping nan, but in this case the performance would be significantly drops because it cannot be vectorized.
Do we support interpolation along dimension without coordinate?
In that case, do we attach new coordinate to the object?
How should we do if new coordinate has the dimensional coordinate for the dimension to be interpolated?
e.g. in the following case,
what would be
rslt['x']
?I appreciate any comments.