UnicodeDecodeError: 'utf-8' codec can't decode byte ..... json.decode() can't handle the 'plus minus' symbol #432

svikolev · 2022-09-08T00:50:15Z

Bug report

Bug summary

Zeiss colibri 2 driver has information fields (property names and vals grabbed using pycromanager):

{'Description': 'Zeiss Colibri adapter',
'Info LED-445nm': '445nm ±24nm, 1000m',
'Info LED-505nm': '505nm ±30nm, 1000m',
'Info LED-555nm': '555nm ±150nm, 850m',
'Intensity LED-445nm': '0',
...etc.}
note: the plus minus symbol.

When acquiring events with the Acquisition, some metadata is saved, but then it is read back for display and the following error is encountered:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 412: invalid start byte

Using a debugger I determined that the byte is the plus minus symbol which the json encoder (utf-8') does not know?

Previously this was encountered in data.py but with the updated pycromamager I encountered it in the ndtiff\nd_tiff_v2.py while trying to open a dataset:

File "C:\ProgramData\Anaconda3\envs\Pumps38\lib\site-packages\ndtiff\nd_tiff_v2.py", line 87, in read_metadata
return json.loads(
File "C:\ProgramData\Anaconda3\envs\Pumps38\lib\json_init_.py", line 343, in loads
s = s.decode(detect_encoding(s), 'surrogatepass')

I would expect the surrogatepass parameter to just skip it but I guess it does not.

In both an older version of PM and the current, I was able to fix this by taking inspiration from bridge.py line 165:

# Paste your code here
message = json.loads(reply[0].decode("iso-8859-1"))
#

and adding the .decode("iso-8859-1") to the read metadata function in ndtiff\nd_tiff_v2.py (or previously in data.py) before it is passed to json.loads(

def read_metadata(self, index):
        return json.loads(
            self._read(
                index["metadata_offset"], index["metadata_offset"] + index["metadata_length"]
            ).decode("iso-8859-1")
        )
#

I am not sure what the best fix would be. Please let me know if I can be more clear, and I will respond promptly.

Version Info

Operating system: win10
pycromanager version: pycromanager 0.18.3 pyhd8ed1ab_0 conda-forge
MicroManager version: 2.0.1 20220720
Python version: 3.8
Python environment: jupyter notebook and pycharme IDE

PS: Thank you so much for the great work on micro and pycro. I wish I switched from zen pro long time ago. I will try to contribute a use case soon.

henrypinkard · 2022-09-08T16:45:19Z

It's not entirely clear to me why this is happening. I think it would be helpful if you dug in a bit more and reported back.

Here is what's supposed to happen:

NDTiff Java library is saving JSON metadata in encoding UTF-8 (The default encoding for String.getBytes())
ndtiff python package is loading that metadata and interpreting it as UTF-8. Essentially it calls:

file = open("path/to/file")
string_read = file.read(start, end)
metadata = json.loads(string_read)

I believe open should have already figured out things are encoded in UTF-8. When I do a quick test and type file into the interpreter, I get:

<_io.TextIOWrapper name='tmp.txt' mode='r' encoding='UTF-8'>

So maybe something is going wrong there because windows doesn't think its UTF-8 (I tested on Mac)

I can run another test of this using:

import json
import numpy as np
raw = np.array([123, 34, -62, -79, 34, 58, 49, 125], dtype=np.int8)
json.loads(raw.tobytes().decode("utf-8"))

This string of bytes is valid JSON in UTF-8 with the ± character. Running it gives me:

{'±': 1}

It is behaves just like the same thing without special characters:

raw = np.array([123, 34, 97, 34, 58, 49, 125], dtype=np.int8)
json.loads(raw.tobytes().decode("utf-8"))

{'a': 1}

So I'm not sure when in the chain of encoding-saving-loading-decoding this goes wrong on your system.

I don't think that the solution you propose above will work, because NDTiff is saving as UTF-8, so decoding as ISO won't always give the same thing. This contrasts to the bridge, which encodes in ISO and decodes in ISO. I don't remember why exactly I chose to do this for the bridge, though I do remember it being confusing

Try playing around with these and see if you can figure out where it goes wrong on your system

By the way, here are useful references for UTF and ISO that I was using to interpret those numbers

PS: Thank you so much for the great work on micro and pycro. I wish I switched from zen pro long time ago. I will try to contribute a use case soon.

Awesome, thanks!

henrypinkard · 2022-12-16T02:51:28Z

#467

svikolev added the bug Something isn't working label Sep 8, 2022

henrypinkard closed this as completed Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte ..... json.decode() can't handle the 'plus minus' symbol #432

UnicodeDecodeError: 'utf-8' codec can't decode byte ..... json.decode() can't handle the 'plus minus' symbol #432

svikolev commented Sep 8, 2022

henrypinkard commented Sep 8, 2022

henrypinkard commented Dec 16, 2022

UnicodeDecodeError: 'utf-8' codec can't decode byte ..... json.decode() can't handle the 'plus minus' symbol #432

UnicodeDecodeError: 'utf-8' codec can't decode byte ..... json.decode() can't handle the 'plus minus' symbol #432

Comments

svikolev commented Sep 8, 2022

Bug report

henrypinkard commented Sep 8, 2022

henrypinkard commented Dec 16, 2022