Skip to content

Commit 55173e8

Browse files
authored
warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (#8874)
* warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend * add whats-new.rst entry * merge maybe_decode_bytes function into _read_attributes, add attribute and variable name to warning
1 parent ee02113 commit 55173e8

File tree

3 files changed

+27
-12
lines changed

3 files changed

+27
-12
lines changed

doc/whats-new.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,15 +60,17 @@ Bug fixes
6060
`CFMaskCoder`/`CFScaleOffsetCoder` (:issue:`2304`, :issue:`5597`,
6161
:issue:`7691`, :pull:`8713`, see also discussion in :pull:`7654`).
6262
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
63-
- do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
63+
- Do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
6464
(:issue:`8844`, :pull:`8852`).
6565
- Adapt handling of copy keyword argument for numpy >= 2.0dev
66-
(:issue:`8844`, :pull:`8851`, :pull:`8865``).
66+
(:issue:`8844`, :pull:`8851`, :pull:`8865`).
6767
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
68-
- import trapz/trapezoid depending on numpy version.
68+
- Import trapz/trapezoid depending on numpy version
6969
(:issue:`8844`, :pull:`8865`).
7070
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
71-
71+
- Warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend
72+
(:issue:`5563`, :pull:`8874`).
73+
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
7274

7375

7476
Documentation

xarray/backends/h5netcdf_.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
from xarray.core import indexing
2929
from xarray.core.utils import (
3030
FrozenDict,
31+
emit_user_level_warning,
3132
is_remote_uri,
3233
read_magic_number_from_file,
3334
try_read_magic_number_from_file_or_path,
@@ -58,21 +59,23 @@ def _getitem(self, key):
5859
return array[key]
5960

6061

61-
def maybe_decode_bytes(txt):
62-
if isinstance(txt, bytes):
63-
return txt.decode("utf-8")
64-
else:
65-
return txt
66-
67-
6862
def _read_attributes(h5netcdf_var):
6963
# GH451
7064
# to ensure conventions decoding works properly on Python 3, decode all
7165
# bytes attributes to strings
7266
attrs = {}
7367
for k, v in h5netcdf_var.attrs.items():
7468
if k not in ["_FillValue", "missing_value"]:
75-
v = maybe_decode_bytes(v)
69+
if isinstance(v, bytes):
70+
try:
71+
v = v.decode("utf-8")
72+
except UnicodeDecodeError:
73+
emit_user_level_warning(
74+
f"'utf-8' codec can't decode bytes for attribute "
75+
f"{k!r} of h5netcdf object {h5netcdf_var.name!r}, "
76+
f"returning bytes undecoded.",
77+
UnicodeWarning,
78+
)
7679
attrs[k] = v
7780
return attrs
7881

xarray/tests/test_backends.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3560,6 +3560,16 @@ def test_dump_encodings_h5py(self) -> None:
35603560
assert actual.x.encoding["compression"] == "lzf"
35613561
assert actual.x.encoding["compression_opts"] is None
35623562

3563+
def test_decode_utf8_warning(self) -> None:
3564+
title = b"\xc3"
3565+
with create_tmp_file() as tmp_file:
3566+
with nc4.Dataset(tmp_file, "w") as f:
3567+
f.title = title
3568+
with pytest.warns(UnicodeWarning, match="returning bytes undecoded") as w:
3569+
ds = xr.load_dataset(tmp_file, engine="h5netcdf")
3570+
assert ds.title == title
3571+
assert "attribute 'title' of h5netcdf object '/'" in str(w[0].message)
3572+
35633573

35643574
@requires_h5netcdf
35653575
@requires_netCDF4

0 commit comments

Comments
 (0)