Skip to content
This repository has been archived by the owner on Jun 19, 2024. It is now read-only.

Encoding issues with vcvarsall #62

Closed
Holt59 opened this issue Apr 19, 2020 · 3 comments
Closed

Encoding issues with vcvarsall #62

Holt59 opened this issue Apr 19, 2020 · 3 comments

Comments

@Holt59
Copy link
Member

Holt59 commented Apr 19, 2020

I know there is already #60 on encoding, but I don't think it's the same issue.

When I run python unimake.py, I get the following error:

2020-04-19 17:09:50,324   ==== Unimake.py started ===
Traceback (most recent call last):
  File "unimake.py", line 210, in <module>
    exitcode = main()
  File "unimake.py", line 115, in main
    init_config(args)
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\config_setup.py", line 92, in init_config
    config['__environment'] = visual_studio_environment()
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\visualstudio.py", line 116, in visual_studio_environment
    vcenv[key] = value
  File "I:\Projects\CPP\modorganizer-umbrella\unibuild\utility\case_insensitive_dict.py", line 43, in __setitem__
    new_value = value.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 13: invalid start byte

The issue is that the Popen process output is not UTF-8 encoded, so when we update the vcenv dictionary that tries to .decode("utf-8"), it fails.

My system language is English but (unfortunately) I have a special character in my username, which appears in some variables returned by SET.

I found a quite simple workaround, that I may send as a PR if it works for you, which is to retrieve the os stdout encoding using os.device_encoding(1):

for line in stdout.splitlines():
    if b"=" in line:
        line = line.decode(os.device_encoding(1))
        key, value = line.split("=", 1)
        vcenv[key] = value
@AnyOldName3
Copy link
Member

AnyOldName3 commented Apr 19, 2020

A few things (some of which are things for you to try and others of which are just notes for developers trying to fix this):

  • Does mbcs work? Explicitly querying the encoding for this process' stdout doesn't guarantee that it's the same for every other program or that everything we put in a dict is from stdout.
  • It looks like when Silarn ported this from Python 2 to 3, he made it try UTF-8, then fall back to copying. I think the code is supposed to catch an exception if you try it on something that's already a str instead of still bytes, and it's not succeeding when the encoding doesn't match as that's a different exception.
  • We should probably just let case_insensitive_dict catch fire if someone tries putting bytes in it instead of a str and deal with encoding when we generate the data (as some things will be UTF-8, and some won't). Alternatively, checking the type and only decoding if it's bytes will be more explicit about what we're currently doing.
  • In this case, I think we should be converting to str with mbcs around here https://github.com/ModOrganizer2/modorganizer-umbrella/blob/master/unibuild/utility/visualstudio.py#L113.

@Holt59
Copy link
Member Author

Holt59 commented Apr 20, 2020

@AnyOldName3

Does mbcs work? Explicitly querying the encoding for this process' stdout doesn't guarantee that it's the same for every other program or that everything we put in a dict is from stdout.

mbcs does work, but I did not suggest putting the specific decoding in the dict.

If you look at the snippet I provided, I am not decoding with os.device_encoding(1) in case_insensitive_dict, but in the visual_studio_environment function since, as you said, it is possible to put str directly in case_insensitive_dict.

In this case, I think we should be converting to str with mbcs around here https://github.com/ModOrganizer2/modorganizer-umbrella/blob/master/unibuild/utility/visualstudio.py#L113.

Yes, that's about what I did except that I used os.device_encoding(1) here instead of mbcs. I can make a PR with mbcs if you want.

@isanae
Copy link
Contributor

isanae commented Jul 10, 2020

Fixed in 1c270d6.

@isanae isanae closed this as completed Jul 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants