Encoding issues with `vcvarsall` #62

Holt59 · 2020-04-19T15:30:33Z

I know there is already #60 on encoding, but I don't think it's the same issue.

When I run python unimake.py, I get the following error:

2020-04-19 17:09:50,324   ==== Unimake.py started ===
Traceback (most recent call last):
  File "unimake.py", line 210, in <module>
    exitcode = main()
  File "unimake.py", line 115, in main
    init_config(args)
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\config_setup.py", line 92, in init_config
    config['__environment'] = visual_studio_environment()
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\visualstudio.py", line 116, in visual_studio_environment
    vcenv[key] = value
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\case_insensitive_dict.py", line 43, in __setitem__
    new_value = value.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 13: invalid start byte

The issue is that the Popen process output is not UTF-8 encoded, so when we update the vcenv dictionary that tries to .decode("utf-8"), it fails.

My system language is English but (unfortunately) I have a special character in my username, which appears in some variables returned by SET.

I found a quite simple workaround, that I may send as a PR if it works for you, which is to retrieve the os stdout encoding using os.device_encoding(1):

for line in stdout.splitlines():
    if b"=" in line:
        line = line.decode(os.device_encoding(1))
        key, value = line.split("=", 1)
        vcenv[key] = value

The text was updated successfully, but these errors were encountered:

AnyOldName3 · 2020-04-19T20:46:49Z

A few things (some of which are things for you to try and others of which are just notes for developers trying to fix this):

Does mbcs work? Explicitly querying the encoding for this process' stdout doesn't guarantee that it's the same for every other program or that everything we put in a dict is from stdout.
It looks like when Silarn ported this from Python 2 to 3, he made it try UTF-8, then fall back to copying. I think the code is supposed to catch an exception if you try it on something that's already a str instead of still bytes, and it's not succeeding when the encoding doesn't match as that's a different exception.
We should probably just let case_insensitive_dict catch fire if someone tries putting bytes in it instead of a str and deal with encoding when we generate the data (as some things will be UTF-8, and some won't). Alternatively, checking the type and only decoding if it's bytes will be more explicit about what we're currently doing.
In this case, I think we should be converting to str with mbcs around here https://github.com/ModOrganizer2/modorganizer-umbrella/blob/master/unibuild/utility/visualstudio.py#L113.

Holt59 · 2020-04-20T16:17:25Z

@AnyOldName3

Does mbcs work? Explicitly querying the encoding for this process' stdout doesn't guarantee that it's the same for every other program or that everything we put in a dict is from stdout.

mbcs does work, but I did not suggest putting the specific decoding in the dict.

If you look at the snippet I provided, I am not decoding with os.device_encoding(1) in case_insensitive_dict, but in the visual_studio_environment function since, as you said, it is possible to put str directly in case_insensitive_dict.

In this case, I think we should be converting to str with mbcs around here https://github.com/ModOrganizer2/modorganizer-umbrella/blob/master/unibuild/utility/visualstudio.py#L113.

Yes, that's about what I did except that I used os.device_encoding(1) here instead of mbcs. I can make a PR with mbcs if you want.

isanae · 2020-07-10T17:13:06Z

Fixed in 1c270d6.

isanae closed this as completed Jul 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issues with `vcvarsall` #62

Encoding issues with `vcvarsall` #62

Holt59 commented Apr 19, 2020 •

edited

Loading

AnyOldName3 commented Apr 19, 2020 •

edited

Loading

Holt59 commented Apr 20, 2020

isanae commented Jul 10, 2020

Encoding issues with vcvarsall #62

Encoding issues with vcvarsall #62

Comments

Holt59 commented Apr 19, 2020 • edited Loading

AnyOldName3 commented Apr 19, 2020 • edited Loading

Holt59 commented Apr 20, 2020

isanae commented Jul 10, 2020

Encoding issues with `vcvarsall` #62

Encoding issues with `vcvarsall` #62

Holt59 commented Apr 19, 2020 •

edited

Loading

AnyOldName3 commented Apr 19, 2020 •

edited

Loading