Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding issues on MSWin32 #52

Open
karenetheridge opened this issue Aug 8, 2014 · 16 comments
Open

encoding issues on MSWin32 #52

karenetheridge opened this issue Aug 8, 2014 · 16 comments
Labels

Comments

@karenetheridge
Copy link

here's an example test: http://www.cpantesters.org/cpan/report/12ed2af8-6bf4-1014-8695-9e662ad63fef

the test: https://metacpan.org/source/ETHER/Dist-Zilla-Plugin-Git-Contributors-0.004/t/01-basic.t#L50
and the prep work done for MSWin32: https://metacpan.org/source/ETHER/Dist-Zilla-Plugin-Git-Contributors-0.004/t/lib/GitSetup.pm#L37

sadly, this doesn't seem to be enough. But I'm not sure what's missing.

@karenetheridge
Copy link
Author

10:30 @haarg ether: i'd recommend testing output separate from arguments
10:32 @haarg i mean test the output based on a known good git repo
10:32 @haarg not based on something generated locally
10:32 @haarg honestly, sending characters to open3 is wrong on every platform
10:32 @haarg which is what you are doing

@karenetheridge
Copy link
Author

For this script:

use strict;
use warnings;

use Git::Wrapper;
use Path::Tiny;

my $root = path(...);  # your blargh root
my $git = Git::Wrapper->new($root);

my $changes = $root->child('Changes');
$changes->spew("Release history for my dist\n\n");
$git->add('Changes');
my $ilmari = 'Dagfinn Ilmari Mannsåker <[email protected]>';
$git->commit({ message => 'first commit - not encoded name', author => $ilmari });

$changes->append("- a changelog entry\n");
$git->add('Changes');
utf8::encode($ilmari);
$git->commit({ message => 'second commit - encoded name', author => $ilmari });

12:20 @Mithaldu ether: looks like the first one gets double, the second one triple-encoded
12:21 < ether> awesmoe :(
12:22 < ether> I wonder if IPC::Open3 is doing that

EDIT:
when adding utf8 to the top of the script, now both commits appear double-encoded, identically.

@karenetheridge
Copy link
Author

12:24 < ether> you've got the configs as described here? http://ox.no/posts/how-to-combine-git-windows-and-non-ascii-letters -- as utf-8 ?
12:24 <+dipsy> [ How to combine Git, Windows, and non-ASCII letters | Circles and Crosses ]
12:25 @Mithaldu ether: i don't know
12:25 @Mithaldu i got none of those set
12:27 @Mithaldu ether: note that my doing it manually works without setting those
12:30 < ether> can you try again, with these lines added after calling Git::Wrapper->new ?
12:30 < ether> $git->config('i18n.logoutputencoding', 'utf-8');
12:30 < ether> $git->config('i18n.commitencoding', 'utf-8');
12:30 < ether> $ENV{LESSCHARSET} = 'utf-8';
12:31 < ether> git might be doing something special if the input is coming from perl? not sure what IPC::Open3 might be doing with layers...
12:31 < ether> whee this is so much fun :D
12:31 < ether> (sarcasm)
12:31 < ether> thanks a lot for your help here :)
12:31 < ether> if we can figure this out, then all users of Git::Wrapper will benefit
12:31 @Mithaldu ether: no improvement
12:32 @Mithaldu in fact, no change at all

@wchristian
Copy link

D:\cpan\Dist-Zilla-Plugin-Git-Contributors\t\blargh>perl meep.pl
git config i18n.logoutputencoding utf-8
status: 0
git config i18n.commitencoding utf-8
status: 0
git add Changes
status: 0
git commit --author=Dagfinn Ilmari Manns├Ñker <[email protected]> --message=first commit - not encoded name
status: 0
git add Changes
status: 0
git commit --author=Dagfinn Ilmari Manns├â┬Ñker <[email protected]> --message=second commit - encoded name
status: 0

D:\cpan\Dist-Zilla-Plugin-Git-Contributors\t\blargh>

@karenetheridge
Copy link
Author

12:49 @Mithaldu i stepped through it
12:49 @Mithaldu and gotta say
12:49 @Mithaldu wow open3 is fucked up
12:50 @Mithaldu also, at the end of the day it does this:
12:50 @Mithaldu I $pid = eval { system 1, @_ }; # 1 == P_NOWAIT
12:50 @Mithaldu with @_ looking like this:
12:50 @Mithaldu https://gist.github.com/wchristian/7712bed072a42bc8d828
12:50 <+dipsy> [ gist:7712bed072a42bc8d828 ]
12:51 @Mithaldu i have no fucking idea how that could possibly work
12:52 < ether> wat
12:53 < ether> maybe time to simply switch to Capture::Tiny
12:53 @Mithaldu yeah, that was my reaction
12:53 @Mithaldu also temp files

@wchristian
Copy link

@wchristian
Copy link

I used procmon to see what is actually being executed, and it turns out that perl does this:

git commit "--author=Dagfinn Ilmari MannsÃ¥ker <[email protected]>" "--message=first commit - not encoded name"

While running a git commit via cmd directly runs this:

git  commit -m"commit message" --author="鮎川タウル <[email protected]>" --amend

@wchristian
Copy link

To expand in an explanatory manner on the last post: It proves fairly clearly that the issue lies in how Perl creates the process, not in how git itself processes the input.

karenetheridge added a commit to karenetheridge/Dist-Zilla-Plugin-Git-Contributors that referenced this issue Aug 13, 2014
see genehack/Git-Wrapper#52 for the
rending of sackcloth, gnashing of teeth, and otherwise lengthy tale of woe
karenetheridge added a commit to karenetheridge/Dist-Zilla-Plugin-Git-Contributors that referenced this issue Aug 13, 2014
          - basic tests are now ascii-only; unicode tests are marked as TODO
            on MSWin32 for now (see (github) genehack/Git-Wrapper#52 for the
            gory details)
          - now properly sorting names using case-insensitive unicode collation
@genehack
Copy link
Owner

@karenetheridge could you try the fix in the https://github.com/genehack/Git-Wrapper/tree/unicrap branch?

I could replicate the encoding error (i.e., t/unicrap.t would fail) but setting the explicit encoding on all the open3 filehandles works.

@karenetheridge
Copy link
Author

@genehack yup, your change in the unicrap branch accounts for the mojibake... but it doesn't explain the strangeness I was seeing: in perl 5.12 and above, I get back sane results from Git::Wrapper no matter whether the string passed is encoded or not! (The good/bad versions of IPC::Open3 here are 1.04 adn 1.05, and there are very minimal differences between these versions, so I surmise it must be something in perl itself that changed.)

I'm also uncertain whether GW should be automatically binmode'ing stdin/stdout/stderr automatically -- e.g. there are config settings such as i18n.logoutputencoding which can be set to another encoding.

@genehack
Copy link
Owner

genehack commented May 6, 2015

I would really like to close this issue.

@karenetheridge : I don't completely understand your previous comment about 5.12 and above.

I'm happy to port over to Capture::Tiny in a branch if that seems like it would solve the problem (or even if it just seems like it's worth testing against), but that's going to require bumping the min Perl up to 5.8.1 (by my reading of Capture::Tiny's docs, anyway). G::W currently works all the way back to 5.6. Maybe it's time ...

@genehack
Copy link
Owner

@karenetheridge @wchristian I just pushed a branch that uses (a pretty rapid and naïve port to) Capture::Tiny instead of IPC::Open3. If either of you wanted to see how it works for you, that would be appreciated. Otherwise, I'll try to see about re-animating a Windows VM soon-ish...

@karenetheridge
Copy link
Author

would that be..?

18:16  * GumbyNET7 CPAN Upload: Git-Wrapper-0.044_91-TRIAL by GENEHACK http://metacpan.org/release/GENEHACK/Git-Wrapper-0.044_91-TRIAL

We should see some smoke results by tomorrow. whee!

@genehack
Copy link
Owner

I pulled the "author=ilmari" test out of the old unicrap branch and threw it into the capture-tiny branch. I'd still like to see a manual test on Windows first before putting it into a test release. Hopefully I can get to that tomorrow, or maybe somebody will beat me to it ...

@haarg
Copy link

haarg commented Jul 14, 2015

I don't think there's any good reason to switch to Capture::Tiny. It offers no real advantages over IPC::Open3 for this use case, and it won't fix the problem anyway.

From what I can see, it is impossible for perl to provide unicode command line arguments in any way that Git will understand without using something like Win32::Unicode::Process.

@haarg haarg mentioned this issue Jul 14, 2015
@genehack
Copy link
Owner

12:58:13  genehack | haarg: is it a problem with how perl is spawning the subprocess on win32, or is it that win32 git only speaks latin-1 or whatever?
12:58:16  genehack | (or both?)
12:58:44     haarg | perl on win32 doesn't use any unicode apis
12:59:10     haarg | win32 has A and W variants of pretty much all its APIs
13:00:01     haarg | perl on win32 uses the A variant because that's what existed in the 9x era
13:00:02  genehack | ok, so really the only option would be using something like Win32::Unicode when on Win32
13:00:12     haarg | and win32 perl is practically unmaintained
13:00:18     haarg | correct
13:00:45     haarg | normal perl on win32 doesn't have access to the APIs needed to pass uncode properly
13:00:52         * | genehack nods
13:01:11  genehack | This is where using Capture::Tiny would help
13:01:28  genehack | because I don't really fancy melding IPC::Open3 with Win32::Unicode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants