-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Avoid unsafe casts from float to unsigned int #9964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b366de
to
fefab07
Compare
xarray/core/duck_array_ops.py
Outdated
if xp == np: | ||
# numpy currently doesn't have a view: | ||
return data.view(*args, **kwargs) | ||
return xp.view(data, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which package supports this? I can't find xp.view in the standard https://data-apis.org/array-api/draft/API_specification/index.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize this was implementing a standard when @kmuehlbauer brought it up in #9815. As in #9815, it works for NumPy and Dask. I don't know about any other packages other than tests don't seem to fail about it specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check this fully, but to be honest, I also tried it the same way locally.
@Illviljan can you suggest a way forward here, so that it is usable for numpy and dask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also try view
and fall back to astype(copy=False)
if that's more portable. And also not expose view
so publicly if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hasattr(__array_namespace__)
implies the method is included in the array api standard. But the standard does not include a .view.
That's why I simply don't think the __array_namespace__
should be used in this case.
But using .astype(copy=False)
does indeed seem like a nice way forward, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view and astype are very different things no? view simply reinterprets the bytes as a different type, astype preserves the value and may change number of bytes to do so, for example. I'm not really sure what to do here. Perhaps just xfail? Or else you'll have to track down what netcdf-java does, which is the ultimate source of this _Unsigned convention IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking in here again to see if you had an idea which way to move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @QuLogic for the delay. I'm not sure what's the best way forward. I've consulted GDAL's netcdf driver and they use some construct like static_cast<int>
/static_cast<uint>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QuLogic According to the suggestions of @Illviljan and @dcherian (and while looking again numpy/numpy#22951) I've pushed a fix which I believe might fix the issue, or at least avoid the undefined cast. Would you mind checking this on your workflows? Thanks!
If this works we could clean up the PR and go ahead and merge.
@@ -345,7 +345,14 @@ def encode(self, variable: Variable, name: T_Name = None): | |||
if fill_value is not None and has_unsigned: | |||
pop_to(encoding, attrs, "_Unsigned") | |||
# XXX: Is this actually needed? Doesn't the backend handle this? | |||
data = duck_array_ops.astype(duck_array_ops.around(data), dtype) | |||
signed_dtype = np.dtype(f"i{data.itemsize}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcherian @Illviljan Does this agree with your suggestion? This first casts the rounded float to int (of same itemsize) and in a second step the int to the final intN (where N is the wanted itemsize).
@QuLogic Does this work on your machine type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this passes tests on all architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QuLogic Great! Thanks for checking. How should we take it from here? From my perspective, we can drop the changes in duck_array_ops
and add a code comment why there is the two stage casting. An entry to whats-new.rst
would be good for visibility. Let me know, if you have the bandwidth atm? Otherwise I can take car of this.
Thanks @QuLogic, thanks all! |
Thanks for finishing this up; I didn't have much chance to look at it in depth the past week. |
Note that this adds.view
toduck_array_ops
; I'm not certain if this is new user API or not.This works across all non-x86_64 architectures in Fedora. I did not test RISC-V, as that is not a primary architecture, but it is likely the same issue and maybe @U2FsdGVkX1 can give it a try.
whats-new.rst
api.rst