Description
XREFs
- RFC: Consider updating
copy
semantics inastype
toFalse/None/True
#788 - Disambiguate what copy=True means for dask #866
- BUG:
astype(copy=False)
andasarray
deep-copy the buffer when changing signedness numpy/numpy#27509
astype specifies:
copy (bool) – specifies whether to copy an array when the specified dtype matches the data type of the input array x. If True, a newly allocated array must always be returned. If False and the specified dtype matches the data type of the input array, the input array must be returned; otherwise, a newly allocated array must be returned. Default: True.
I think that the definition of copy=False is unclear when two dtypes only differ in signedness (e.g. int64 vs uint64) so one could be a view of the other. This is particularly true if you consider that astype has no option for elementwise validation vs. over/underflow.
"A newly allocated array" could be interpreted either as
- new python object around a new (deep-copied) buffer, OR
- new python object, possibly pointing to the same buffer (a "view" in numpy speech).
In numpy, astype unnecessarily deep-copies. I suggested changing its behaviour at numpy/numpy#27509 but the feedback was that the behaviour of numpy is unlikely to change as it would be a breaking change. However, there is no reason why other libraries ascribing to the array API would need to replicate numpy's behaviour.
My proposal here is to add a clause
If False and the specified dtype matches the data type of the input array, the input array must be returned; otherwise, a newly allocated array must be returned, which may or may not share memory with the input array.