-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Adding mask to open_rasterio #2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jsignell! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2019-04-26 15:27:42 UTC |
@jsignell, this looks like it would work - I like the implementation. Some things I would recommend to add is checks for when the nodata value varies across bands (maybe skip the mask and raise a warning) and when the nodata value is already NaN. Also, you should probably set the nodatavals to NaN if masked to resemble the behavior of xarray elsewhere. I think @fmaussion would be a good reviewer for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mask=True
to be the default? That's how we handle netCDF files.
xarray/backends/rasterio_.py
Outdated
@@ -298,6 +300,10 @@ def open_rasterio(filename, parse_coordinates=None, chunks=None, cache=None, | |||
result = DataArray(data=data, dims=('band', 'y', 'x'), | |||
coords=coords, attrs=attrs) | |||
|
|||
if mask: | |||
for nodataval in attrs.get('nodatavals', ()): | |||
result = result.where(result != nodataval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of this approach is that unlike how to decode metadata for netCDF files, this won't be lazy.
See CFMaskCoder
in xarray/coding/variables.py for who we do this with netCDF files. In particular, it probably makes sense to use lazy_elemwise_func
and to move nodatavals
from attrs
to encoding
.
I wonder if |
I was wondering if we should just pass the mask option to rasterio like that. I'll try it out. |
Hmm. Do we really want a np.masked_array or do we just want an array with np.Nans in it? I prefer nans, but I guess the downside is type conversion for ints? |
I did as @snowman2 suggested but filled the mask with nans at the end. Since it is now using rasterio directly this should address @fmaussion's concern about |
np.nan if nodataval is None else nodataval | ||
for nodataval in riods.nodatavals) | ||
if mask: | ||
encoding['nodatavals'] = nodatavals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thing setting the encoding should inline with the other if statements and should be something like:
if mask and riods.nodata is not None:
encoding['_FillValue'] = riods.nodata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masked arrays come with huge performance drops, I don't think we should use these. |
xarray will autoconvert a masked array into a numpy array with NaNs. It
probably doesn't make a difference where this happens.
…On Thu, Dec 20, 2018 at 9:04 AM Fabien Maussion ***@***.***> wrote:
Do we really want a np.masked_array or do we just want an array with
np.Nans in it? I prefer nans, but I guess the downside is type conversion
for ints?
Masked arrays come with huge performance drops, I don't think we should
use these.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2618 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1uztASAzUF-7fNanXXJvRWxELml3ks5u68MegaJpZM4ZZMJn>
.
|
I'm sorry this has stalled now - I think this is a nice addition. @jsignell : would you have time to finish this up? |
Yeah I have some time to finish it up. Looking back at it now it seems like |
I think it would be good to write the dataset to a netcdf, and read it back in with xarray.open_dataset to ensure everything encodes/decides properly. |
Do you mean write a test to do this, or just check it locally? |
Good question, I was thinking adding tests. |
@@ -39,6 +39,7 @@ def __init__(self, manager, lock, vrt_params=None): | |||
if not np.all(np.asarray(dtypes) == dtypes[0]): | |||
raise ValueError('All bands should have the same dtype') | |||
self._dtype = np.dtype(dtypes[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._dtype = np.dtype(dtypes[0]) | |
self._dtype = np.float64 if self.masked else np.dtype(dtypes[0]) |
out = riods.read(band_key, window=window) | ||
out = riods.read(band_key, window=window, masked=self.masked) | ||
if self.masked: | ||
out = np.ma.filled(out, np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out = np.ma.filled(out, np.nan) | |
out = np.ma.filled(out.astype(np.float64), np.nan) |
|
||
# this lets you write arrays loaded with rasterio | ||
data = indexing.CopyOnWriteArray(data) | ||
if cache and chunks is None: | ||
data = indexing.MemoryCachedArray(data) | ||
|
||
result = DataArray(data=data, dims=('band', 'y', 'x'), | ||
coords=coords, attrs=attrs) | ||
coords=coords, attrs=attrs, encoding=encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coords=coords, attrs=attrs, encoding=encoding) | |
coords=coords, attrs=attrs) | |
result.encoding = encoding |
@jsignell, the |
this has been unintentionally closed when renaming the branch to merge into to |
I'm fine with this being closed. TBH I had totally forgotten about it 😬 |
whats-new.rst
for all changes andapi.rst
for new APINot sure if this is the right approach @snowman2