Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure on RISC-V platform #9815

Open
3 of 5 tasks
U2FsdGVkX1 opened this issue Nov 24, 2024 · 13 comments · May be fixed by #9964
Open
3 of 5 tasks

Test failure on RISC-V platform #9815

U2FsdGVkX1 opened this issue Nov 24, 2024 · 13 comments · May be fixed by #9964
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@U2FsdGVkX1
Copy link

U2FsdGVkX1 commented Nov 24, 2024

What happened?

I am a Fedora packager
When I run pytest on the RISC-V platform, I got four failures

What did you expect to happen?

No response

Minimal Complete Verifiable Example

pytest xarray/tests/test_backends.py::TestNetCDF4ViaDaskData::test_roundtrip_mask_and_scale
pytest xarray/tests/test_backends.py::TestNetCDF4Data::test_roundtrip_mask_and_scale

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

XFAIL tests/test_units.py::TestDataArray::test_searchsorted[int64-function_searchsorted-identical_unit] - xarray does not implement __array_function__
XFAIL tests/test_units.py::TestDataArray::test_missing_value_filling[int64-method_bfill] - ffill and bfill lose units in data
XFAIL tests/test_units.py::TestPlots::test_units_in_slice_line_plot_labels_isel[coord_unit1-coord_attrs1] - pint.errors.UnitStrippedWarning
XFAIL tests/test_units.py::TestPlots::test_units_in_line_plot_labels[coord_unit1-coord_attrs1] - indexes don't support units
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg2-np_arg2-median] - median is not implemented by Dask
XFAIL tests/test_units.py::TestDataset::test_missing_value_filling[int64-method_bfill] - ffill and bfill lose the unit
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg0-np_arg0-median] - median is not implemented by Dask
XFAIL tests/test_units.py::TestDataArray::test_interp_reindex_like[int64-method_interp_like-data] - uses scipy
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg2-np_arg2-reflect] - dask.array.pad bug
XFAIL tests/test_units.py::TestDataset::test_interp_reindex_like[float64-method_interp_like-coords] - uses scipy
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg3-np_arg3-median] - median is not implemented by Dask
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg4-np_arg4-median] - median is not implemented by Dask
XFAIL tests/test_units.py::TestDataArray::test_numpy_methods_with_args[float64-no_unit-function_clip] - xarray does not implement __array_function__
XFAIL tests/test_units.py::TestDataset::test_interp_reindex[int64-method_interp-data] - uses scipy
XFAIL tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg3-np_arg3-reflect] - dask.array.pad bug
XFAIL tests/test_units.py::TestDataArray::test_interp_reindex_like[int64-method_interp_like-coords] - uses scipy
XFAIL tests/test_units.py::TestDataset::test_interp_reindex_like[int64-method_interp_like-data] - uses scipy
XFAIL tests/test_units.py::TestDataset::test_interp_reindex[int64-method_interp-coords] - uses scipy
XFAIL tests/test_units.py::TestDataset::test_interp_reindex_like[int64-method_interp_like-coords] - uses scipy
XPASS tests/test_computation.py::test_cross[a6-b6-ae6-be6-cartesian--1-False]
XPASS tests/test_dataarray.py::TestDataArray::test_copy_coords[True-expected_orig0]
XPASS tests/test_dataarray.py::TestDataArray::test_to_dask_dataframe - dask-expr is broken
XPASS tests/test_dataset.py::TestDataset::test_copy_coords[True-expected_orig0]
XPASS tests/test_computation.py::test_cross[a5-b5-ae5-be5-cartesian--1-False]
XPASS tests/test_plot.py::TestImshow::test_dates_are_concise - Failing inside matplotlib. Should probably be fixed upstream because other plot functions can handle it. Remove this test when it works, already in Common2dMixin
XPASS tests/test_units.py::TestVariable::test_computation[int64-method_rolling_window-numbagg] - converts to ndarray
XPASS tests/test_units.py::TestVariable::test_computation[int64-method_rolling_window-None] - converts to ndarray
XPASS tests/test_units.py::TestVariable::test_computation[float64-method_rolling_window-numbagg] - converts to ndarray
XPASS tests/test_units.py::TestVariable::test_computation[float64-method_rolling_window-None] - converts to ndarray
XPASS tests/test_units.py::TestDataset::test_computation_objects[float64-coords-method_rolling] - strips units
XPASS tests/test_units.py::TestDataset::test_computation_objects[int64-coords-method_rolling] - strips units
XPASS tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg0-np_arg0-reflect] - dask.array.pad bug
XPASS tests/test_variable.py::TestVariableWithDask::test_pad[xr_arg1-np_arg1-reflect] - dask.array.pad bug
FAILED tests/test_backends.py::TestNetCDF4ViaDaskData::test_roundtrip_mask_and_scale[dtype0-create_unsigned_false_masked_scaled_data-create_encoded_unsigned_false_masked_scaled_data]
FAILED tests/test_backends.py::TestNetCDF4Data::test_roundtrip_mask_and_scale[dtype0-create_unsigned_false_masked_scaled_data-create_encoded_unsigned_false_masked_scaled_data]
FAILED tests/test_backends.py::TestNetCDF4ViaDaskData::test_roundtrip_mask_and_scale[dtype1-create_unsigned_false_masked_scaled_data-create_encoded_unsigned_false_masked_scaled_data]
FAILED tests/test_backends.py::TestNetCDF4Data::test_roundtrip_mask_and_scale[dtype1-create_unsigned_false_masked_scaled_data-create_encoded_unsigned_false_masked_scaled_data]

Anything else we need to know?

TestNetCDF4ViaDaskData::test_roundtrip_mask_and_scale
image

TestNetCDF4Data::test_roundtrip_mask_and_scale
image

Environment

commit: 1bb867d
python: 3.13.0 (main, Oct 8 2024, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)]
python-bits: 64
OS: Linux
OS-release: 6.1.55
machine: riscv64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.9.2

xarray: 2024.11.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.11.3
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2024.9.1
distributed: None
matplotlib: 3.9.1
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.24.4
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.2
conda: None
pytest: 8.3.3
mypy: None
IPython: None
sphinx: 7.3.7

@U2FsdGVkX1 U2FsdGVkX1 added bug needs triage Issue that has not been reviewed by xarray team member labels Nov 24, 2024
Copy link

welcome bot commented Nov 24, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@max-sixty
Copy link
Collaborator

Thanks.

I suspect this is an issue with the underlying dependency, given it only occurs in netcdf tests. We could skip the tests when running under that arch?

@U2FsdGVkX1
Copy link
Author

Thanks.

I suspect this is an issue with the underlying dependency, given it only occurs in netcdf tests. We could skip the tests when running under that arch?

I'm not sure what test this is, but xarray passes all other tests in riscv64.
if it is related to other modules, I think it can be skipped.

@max-sixty
Copy link
Collaborator

if it is related to other modules, I think it can be skipped.

Yes great, we would take a PR to skip them (doesn't have to be from you tbc, I know you're already being generous by packaging this...)

@QuLogic
Copy link
Contributor

QuLogic commented Jan 14, 2025

These four tests actually fail on all non-x86_64 arches: aarch64, ppc64le, s390x. I don't know if it's a problem in netCDF4 or not.

It seems like a -1 is replaced by a 10; is that some kind of fill value, or something else?

@QuLogic
Copy link
Contributor

QuLogic commented Jan 14, 2025

If I've understood this roundtrip test correctly, it starts with a dataset of [-1.0, 10.1, 22.7, np.nan], defines an encoding that adds an offset of 10 and scaling of 0.1 (though I'm a bit confused why the offset seems to be defined in terms of data->encoded while scale is encoded->data) to fit in a u1 dtype (aka unsigned char), then saves and reads the file back.

My guess is that somewhere is using plain char for this, which is why the result is positive. In C, char's signedness is implementation defined (though that doesn't apply to other integer types), and on x86_64 it is signed, but on ARM it is unsigned. I don't know about the other architectures, but I'd guess they are also unsigned.

Is this encoding done in xarray or in netCDF?

@QuLogic
Copy link
Contributor

QuLogic commented Jan 14, 2025

Ah, looking through more of the code, I've found the encoding in CFScaleOffsetCoder. In encode, the data is cast to np.float64 and offset/scaled to [-110., 1., 127., nan] (before returning a Variable with it.) I didn't work through the entirety of the how this ends up in the netCDF4 file, but if I run ncdump /path/to/temp.nc, I get:

netcdf temp-9 {
dimensions:
	t = 4 ;
variables:
	ubyte x(t) ;
		x:_FillValue = 255UB ;
		x:add_offset = 10. ;
		x:scale_factor = 0.1 ;
		x:_Unsigned = "false" ;
data:

 x = 0, 1, 127, _ ;
}

So the error occurs during encoding, not decoding necessarily.

Given that the test has in the encoding 'dtype': 'u1', I assume that at some point the data is cast from np.float64 to np.uint8, though I didn't find whether that was done by xarray internally or within netCDF4. Unfortunately, this casting is undefined behaviour: numpy/numpy#16073 numpy/numpy#19881 (comment)

For example, on x86_64:

>>> import numpy as np
>>> np.float64(-110).astype(np.uint8)
np.uint8(146)
>>> np.float64(-110).astype(np.uint16)
np.uint16(65426)
>>> np.float64(-110).astype(np.uint32)
np.uint32(4294967186)

but on s390x:

>>> np.float64(-110).astype(np.uint8)
<python-input-1>:1: RuntimeWarning: invalid value encountered in cast
  np.float64(-110).astype(np.uint8)
np.uint8(0)
>>> np.float64(-110).astype(np.uint16)
<python-input-5>:1: RuntimeWarning: invalid value encountered in cast
  np.float64(-110).astype(np.uint16)
np.uint16(0)
>>> np.float64(-110).astype(np.uint32)
<python-input-6>:1: RuntimeWarning: invalid value encountered in cast
  np.float64(-110).astype(np.uint32)
np.uint32(0)

So either this test, or xarray itself is doing something incorrect in this encoding mode.

@QuLogic
Copy link
Contributor

QuLogic commented Jan 14, 2025

PS, here is the backtrace if I make the RuntimeWarning into an error:

________ TestNetCDF4Data.test_roundtrip_mask_and_scale[dtype0-create_unsigned_false_masked_scaled_data-create_encoded_unsigned_false_masked_scaled_data] _______

self = <xarray.tests.test_backends.TestNetCDF4Data object at 0x200453eef80>, decoded_fn = <function create_unsigned_false_masked_scaled_data at 0x200453a7920>
encoded_fn = <function create_encoded_unsigned_false_masked_scaled_data at 0x200453a79c0>, dtype = dtype('float64')

    @pytest.mark.parametrize(
        "decoded_fn, encoded_fn",
        [
            (
                create_unsigned_masked_scaled_data,
                create_encoded_unsigned_masked_scaled_data,
            ),
            pytest.param(
                create_bad_unsigned_masked_scaled_data,
                create_bad_encoded_unsigned_masked_scaled_data,
                marks=pytest.mark.xfail(reason="Bad _Unsigned attribute."),
            ),
            (
                create_signed_masked_scaled_data,
                create_encoded_signed_masked_scaled_data,
            ),
            (
                create_unsigned_false_masked_scaled_data,
                create_encoded_unsigned_false_masked_scaled_data,
            ),
            (create_masked_and_scaled_data, create_encoded_masked_and_scaled_data),
        ],
    )
    @pytest.mark.parametrize("dtype", [np.dtype("float64"), np.dtype("float32")])
    def test_roundtrip_mask_and_scale(self, decoded_fn, encoded_fn, dtype) -> None:
        if hasattr(self, "zarr_version") and dtype == np.float32:
            pytest.skip("float32 will be treated as float64 in zarr")
        decoded = decoded_fn(dtype)
        encoded = encoded_fn(dtype)
        if decoded["x"].encoding["dtype"] == "u1" and not (
            (self.engine == "netcdf4" and self.file_format is None)
            or self.file_format == "NETCDF4"
        ):
            pytest.skip("uint8 data can't be written to non-NetCDF4 data")
    
>       with self.roundtrip(decoded) as actual:

tests/test_backends.py:974: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.13/contextlib.py:141: in __enter__
    return next(self.gen)
tests/test_backends.py:374: in roundtrip
    self.save(data, path, **save_kwargs)
tests/test_backends.py:395: in save
    return dataset.to_netcdf(
core/dataset.py:2373: in to_netcdf
    return to_netcdf(  # type: ignore[return-value]  # mypy cannot resolve the overloads:(
backends/api.py:1905: in to_netcdf
    dump_to_store(
backends/api.py:1952: in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
backends/common.py:454: in store
    variables, attributes = self.encode(variables, attributes)
backends/common.py:638: in encode
    variables, attributes = cf_encoder(variables, attributes)
conventions.py:759: in cf_encoder
    new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
conventions.py:101: in encode_cf_variable
    var = coder.encode(var, name=name)
coding/variables.py:429: in encode
    data = duck_array_ops.astype(duck_array_ops.around(data), dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = array([-110.,    1.,  127.,  255.]), dtype = dtype('uint8'), kwargs = {}, xp = <module 'numpy' from '/usr/lib64/python3.13/site-packages/numpy/__init__.py'>

    def astype(data, dtype, **kwargs):
        if hasattr(data, "__array_namespace__"):
            xp = get_array_namespace(data)
            if xp == np:
                # numpy currently doesn't have a astype:
>               return data.astype(dtype, **kwargs)
E               RuntimeWarning: invalid value encountered in cast

core/duck_array_ops.py:234: RuntimeWarning

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Jan 14, 2025

@QuLogic Thanks for diving into this. From the stack trace and your examples that looks like something like this:

numpy/numpy#22951

So, if we change this line:

coding/variables.py:429: in encode
    data = duck_array_ops.astype(duck_array_ops.around(data), dtype)

to

coding/variables.py:429: in encode
    data = duck_array_ops.astype(duck_array_ops.around(data), dtype, casting="safe")

We will also see this issue in our test suite.

@QuLogic In the linked issue they suggested to do something like this:

coding/variables.py:429: in encode
    signed_dtype = np.dtype(f"i{dtype.itemsize}")
    data = duck_array_ops.astype(duck_array_ops.astype(duck_array_ops.around(data), signed_dtype), dtype, casting="safe")

Will this help in your setup?

Update: Nevermind, this might still break. Can we somehow use .view with duck arrays?

@QuLogic
Copy link
Contributor

QuLogic commented Jan 15, 2025

Unfortunately, that won't work with casting='safe', since -110 cannot be represented as an unsigned value. The tests do pass with that change if you don't require safe casting.

But then, I have to ask, is this test actually correct? If you want to encode some floating-point data that might include negative values to an unsigned type, don't you want to shift things to be all positive and not rely on casting rules that may differ between types? In other words, should add_offset not be -10?

Though actually trying that, the 22.7 would overflow, so something like add_offset=-2 would work without invoking undefined behaviour. This would encode the data from [-1.0, 10.1, 22.7, np.nan] to [ 10, 121, 247, nan] which would fit in a uint8.

However, this breaks in a different way because the reader appears to incorrectly read as int8, and produces -2.9 for the 22.7. So there's a different bug hidden there due to the test's setup.

@kmuehlbauer
Copy link
Contributor

@QuLogic
Copy link
Contributor

QuLogic commented Jan 15, 2025

OK, so netCDF3 didn't support unsigned data types, and some people hacked around it with an attribute _Unsigned=true. Then netCDF4 supported both signed&unsigned, but some other software didn't support signed, and other people hacked around that with _Unsigned=false. So the encoded data is supposedly int8 even though the dtype is uint8.

OK, I think you can only have that work with unsafe casting then. Namely, your suggestion with unsafe:

coding/variables.py:429: in encode
    signed_dtype = np.dtype(f"i{dtype.itemsize}")
    data = duck_array_ops.astype(duck_array_ops.astype(duck_array_ops.around(data), signed_dtype), dtype, casting="unsafe")

or relying on the default:

coding/variables.py:429: in encode
    signed_dtype = np.dtype(f"i{dtype.itemsize}")
    data = duck_array_ops.astype(duck_array_ops.astype(duck_array_ops.around(data), signed_dtype), dtype)

or, if you do view on a duck array to not copy (untested):

coding/variables.py:429: in encode
    signed_dtype = np.dtype(f"i{dtype.itemsize}")
    data = duck_array_ops.view(duck_array_ops.astype(duck_array_ops.around(data), signed_dtype), dtype)

@kmuehlbauer
Copy link
Contributor

@QuLogic Yes, that's how it is, unfortunately.

QuLogic added a commit to QuLogic/xarray that referenced this issue Jan 19, 2025
@QuLogic QuLogic linked a pull request Jan 19, 2025 that will close this issue
4 tasks
QuLogic added a commit to QuLogic/xarray that referenced this issue Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants