Skip to content

Avoid unsafe casts from float to unsigned int #9964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 3, 2025

Conversation

QuLogic
Copy link
Contributor

@QuLogic QuLogic commented Jan 19, 2025

Note that this adds .view to duck_array_ops; I'm not certain if this is new user API or not.

This works across all non-x86_64 architectures in Fedora. I did not test RISC-V, as that is not a primary architecture, but it is likely the same issue and maybe @U2FsdGVkX1 can give it a try.

if xp == np:
# numpy currently doesn't have a view:
return data.view(*args, **kwargs)
return xp.view(data, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which package supports this? I can't find xp.view in the standard https://data-apis.org/array-api/draft/API_specification/index.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize this was implementing a standard when @kmuehlbauer brought it up in #9815. As in #9815, it works for NumPy and Dask. I don't know about any other packages other than tests don't seem to fail about it specifically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check this fully, but to be honest, I also tried it the same way locally.

@Illviljan can you suggest a way forward here, so that it is usable for numpy and dask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also try view and fall back to astype(copy=False) if that's more portable. And also not expose view so publicly if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian Could you have a look here, too? This would unblock @QuLogic for some Fedora builds. Is the implementation of view the right way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hasattr(__array_namespace__) implies the method is included in the array api standard. But the standard does not include a .view. That's why I simply don't think the __array_namespace__ should be used in this case.

But using .astype(copy=False) does indeed seem like a nice way forward, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view and astype are very different things no? view simply reinterprets the bytes as a different type, astype preserves the value and may change number of bytes to do so, for example. I'm not really sure what to do here. Perhaps just xfail? Or else you'll have to track down what netcdf-java does, which is the ultimate source of this _Unsigned convention IIRC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking in here again to see if you had an idea which way to move forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @QuLogic for the delay. I'm not sure what's the best way forward. I've consulted GDAL's netcdf driver and they use some construct like static_cast<int>/static_cast<uint>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuLogic According to the suggestions of @Illviljan and @dcherian (and while looking again numpy/numpy#22951) I've pushed a fix which I believe might fix the issue, or at least avoid the undefined cast. Would you mind checking this on your workflows? Thanks!

If this works we could clean up the PR and go ahead and merge.

@@ -345,7 +345,14 @@ def encode(self, variable: Variable, name: T_Name = None):
if fill_value is not None and has_unsigned:
pop_to(encoding, attrs, "_Unsigned")
# XXX: Is this actually needed? Doesn't the backend handle this?
data = duck_array_ops.astype(duck_array_ops.around(data), dtype)
signed_dtype = np.dtype(f"i{data.itemsize}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian @Illviljan Does this agree with your suggestion? This first casts the rounded float to int (of same itemsize) and in a second step the int to the final intN (where N is the wanted itemsize).

@QuLogic Does this work on your machine type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this passes tests on all architectures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuLogic Great! Thanks for checking. How should we take it from here? From my perspective, we can drop the changes in duck_array_ops and add a code comment why there is the two stage casting. An entry to whats-new.rst would be good for visibility. Let me know, if you have the bandwidth atm? Otherwise I can take car of this.

@kmuehlbauer kmuehlbauer added the plan to merge Final call for comments label Jun 2, 2025
@kmuehlbauer kmuehlbauer merged commit d1fee09 into pydata:main Jun 3, 2025
38 checks passed
@kmuehlbauer
Copy link
Contributor

Thanks @QuLogic, thanks all!

@QuLogic QuLogic deleted the float-uint-casts branch June 4, 2025 04:04
@QuLogic
Copy link
Contributor Author

QuLogic commented Jun 4, 2025

Thanks for finishing this up; I didn't have much chance to look at it in depth the past week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-arrays related to flexible array support topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure on RISC-V platform
5 participants