Add support for East Asian Width #375

knu · 2013-01-16T20:49:57Z

Mosh currently uses wcwidth(3) for computing the width of each character, but the somewhat outdated standard function does not take East Asian Width (http://www.unicode.org/reports/tr11/) into account.

Characters marked as having an "ambiguous" width should be displayed as wide character under East Asian (CJK) locales while they should be narrow under non-CJK locales.

Most major terminal emulators, terminal multiplexers and terminal-based text editors support this option, so I believe mosh should too. Here's an incomplete list of those that support it:

xterm
GNOME terminal
Konsole + unofficial patch
PuTTY + unofficial patch
iTerm2
GNU screen (4.01.00+)
tmux + unofficial patch
GNU Emacs
Vim
…

My patch is a quick hack that simply does the job with a least effort and hopefully with a least overhead.
To do it a cleaner way, you may want to consider using ICU (http://site.icu-project.org) which offers a native C++ API to the Unicode database, though I'm not sure how it impacts on performance.

glance- · 2013-01-21T08:16:07Z

It would be nice to be able to switch to using this implementation of wcwidth by default at compiletime.
That would solve #361.

sirainen · 2013-01-25T10:43:17Z

This looks like the same wcwidth() implementation as in my "unichar" branch that gets rid of wchar_t. I think better to just use this new wcwidth() implementation unconditionally. https://github.com/sirainen/mosh/tree/unichar

About ICU: I don't think it implements wcwidth() directly either, or does much else that is helpful in implementing this in a cleaner way? At least I didn't notice anything obvious when I looked into it, but I didn't look very hard. A dependency on ICU just for this small functionality seems like an overkill anyway.

knu · 2013-01-25T11:25:56Z

The reason I mentioned ICU was that once you use it you are free from maintaining the character width database updated in future releases of Unicode, which ICU has been and will be following in a timely manner. The wcwidth() implementation in question is already so outdated; it is based on Unicode 2.0 when we have 6.2 by now (although I haven't looked closely into the differences that may affect this feature).

However, I have to admit that it would be an overkill to adopt ICU just for this. Maybe we should just live with the wcswidth() solution for the moment and revisit ICU in future when we end up with a desperate need for normalization, combining, bi-di and so on.

andersk · 2013-02-21T22:19:58Z

Why doesn’t glibc support this natively? Is this http://sourceware.org/bugzilla/show_bug.cgi?id=4335?

ghost · 2013-02-21T23:15:05Z

http://gitorious.org/enca#more

knu · 2013-05-17T07:16:16Z

Rebased.

keithw · 2013-05-17T12:11:58Z

Is there any hope of just getting glibc to fix the CJK locales, or make alternate CJK locales in which wcwidth() reports 2 (instead of 1) for ambiguous characters?

I'm reluctant to include our own implementation of a standard library function (wcwidth() in this case), which we will then have to keep updating as Unicode evolves, especially if this is really a bug in the locale. Either way, we still have no guarantee that wcwidth() (whether internal or from the locale) matches what the outer terminal is doing.

keithw · 2013-05-17T12:36:19Z

One crazy possibility would be to allow the user to build their own wcwidth() table, based on what their outer terminal actually does, and have the client upload it to the server. For example, using the following. (We'd have to pull the real ranges of assigned characters from the latest Unicode spec instead of just using 0..65535.)

I honestly think this may be more the "right thing" than hardcoding our own wcwidth table.

#!/usr/bin/perl -w

use strict;
use Term::ReadKey;
use bytes;

ReadMode 5;

my %wc;
for ( my $query = 0; $query <= 65535; $query++ ) {
  print qq{\033[Hx}, (pack q{U}, $query), qq{\033[6n};
  my $str; while ( 1 ) { $str .= ReadKey 0; last if $str =~ m{R$} }
  my ( $row, $col ) = $str =~ m{\[(\d+);(\d+)R};
  $wc{ $query } = $col - 2;
}

ReadMode 0;

print qq{\033[H\033[2J};

open OUT, q{>wcwidth.txt} or die qq{$!};
printf OUT qq{U+%04X %d\n}, $_, $wc{ $_ } for sort { $a <=> $b } keys %wc;

knu · 2013-05-18T14:19:01Z

I think allowing user to inject a width table is a good idea.

As I mentioned in the first post, most major terminal based applications already include their own alternatives to the "broken" wcwidth(), and the fact itself is part of the negative cycle, keeping the vendors of libc reluctant to fix wcwidth() on their side. So, even if glibc decided to fix the bug, portable apps like mosh could not depend on it unless the same decision is made in OS X, *BSD, SunOS, etc. etc.

keithw · 2013-05-18T14:27:05Z

Yes, I took your post to heart and am grateful for your attention to these issues. (Any Unicode language lawyer is a friend of ours here at Mosh HQ.) I guess tentatively let's go forward with the plan to allow the user to supply an external wcwidth table to mosh-client! And figure out a graceful way to upload this to the server, and supply an executable (possible external to mosh or mosh-client) that builds the table.

knu · 2013-05-18T14:54:39Z

Great! Let me help test it when it's getting ready.

This is a real quick hack to replace wcwidth(3) with a version that takes East Asian (CJK) ambiguous characters into account. A new option -w is added to mosh(1), which is enabled by default if run under a CJK locale.

knu force-pushed the east-asian-width branch from d0442e9 to dfe82d2 Compare March 13, 2015 06:21

floam mentioned this pull request Mar 11, 2016

Issue rendering completion when longer than terminal size fish-shell/fish-shell#2484

Closed

Add support for East Asian Width.

21e8df6

This is a real quick hack to replace wcwidth(3) with a version that takes East Asian (CJK) ambiguous characters into account. A new option -w is added to mosh(1), which is enabled by default if run under a CJK locale.

knu force-pushed the east-asian-width branch from dfe82d2 to 21e8df6 Compare April 25, 2018 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for East Asian Width #375

Add support for East Asian Width #375

knu commented Jan 16, 2013

glance- commented Jan 21, 2013

sirainen commented Jan 25, 2013

knu commented Jan 25, 2013

andersk commented Feb 21, 2013

ghost commented Feb 21, 2013

knu commented May 17, 2013

keithw commented May 17, 2013

keithw commented May 17, 2013

knu commented May 18, 2013

keithw commented May 18, 2013

knu commented May 18, 2013

Add support for East Asian Width #375

Are you sure you want to change the base?

Add support for East Asian Width #375

Conversation

knu commented Jan 16, 2013

glance- commented Jan 21, 2013

sirainen commented Jan 25, 2013

knu commented Jan 25, 2013

andersk commented Feb 21, 2013

ghost commented Feb 21, 2013

knu commented May 17, 2013

keithw commented May 17, 2013

keithw commented May 17, 2013

knu commented May 18, 2013

keithw commented May 18, 2013

knu commented May 18, 2013