Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure with Korean #38

Open
thierry-FreeBSD opened this issue May 29, 2018 · 36 comments
Open

Test failure with Korean #38

thierry-FreeBSD opened this issue May 29, 2018 · 36 comments
Labels
bug in other software A bug in some other software having an effect on ibus-typing-booster

Comments

@thierry-FreeBSD
Copy link
Contributor

On FreeBSD, the test suite fails with this message:

======================================================================
FAIL: test_korean (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.0.0/tests/test_itb.py", line 378, in test_korean
    self.assertEqual(self.engine.mock_preedit_text, '안녕하세이')
AssertionError: '안녕세이' != '안녕하세이'
- 안녕세이
+ 안녕하세이
?   +


----------------------------------------------------------------------
Ran 17 tests in 156.142s

FAILED (failures=1)
FAIL run_tests (exit status: 1)

Any idea?

@mike-fabian
Copy link
Owner

On Fedora, the Korean hunspell dictionary was recently updated and therefore I had to adapt the test.
Probably, FreeBSD still has the old Korean dictionary.

@mike-fabian
Copy link
Owner

See:

ca0ecc3

@mike-fabian
Copy link
Owner

That change was already in ibus-typing-booster 1.5.37 though. Did the test case still
work for you in ibus-typing-booster 1.5.37?

@mike-fabian
Copy link
Owner

The comment at the start of the Korean test case says that there is no Korean hunspell dictionary
on FreeBSD and therefore the test is skipped:

def test_korean(self):
if not itb_util.get_hunspell_dictionary_wordlist('ko_KR')[0]:
# No Korean dictionary file could be found, skip this
# test. On some systems, like 'Arch' or 'FreeBSD', there
# is no ko_KR.dic hunspell dictionary available, therefore
# there is no way to run this test on these systems.
# On systems where a Korean hunspell dictionary is available,
# make sure it is installed to make this test case run.
# In the ibus-typing-booster.spec file for Fedora,
# I have a “BuildRequires: hunspell-ko” for that purpose
# to make sure this test runs when building the rpm package.
return

Did you recently get a Korean hunspell dictionary on FreeBSD?
So
itb_util.get_hunspell_dictionary_wordlist('ko_KR')[0]
is successfull now in loading a Korean hunspell dictionary?

@mike-fabian
Copy link
Owner

And, the testcase ues the 'ko-romaja' input method. Do you have it available? On Fedora
it is in this package:

$ rpm -qf /usr/share/m17n/ko-romaja.mim
m17n-db-1.8.0-3.fc28.noarch

@thierry-FreeBSD
Copy link
Contributor Author

Thanks for all these ideas!

  1. The Korean hunspell dictionary was old => I've just upgraded it, but still the same error
  2. You are right: v. 1.5.37 produces the same error
  3. I have a /usr/local/share/m17n/ko-romaja.mim file, installed by m17n-db-1.7.0, but no idea about it (I do not speak Korean) => I'll upgrade the port to the latest 1.8.0.

I shall keep investigating and let you know.

@mike-fabian
Copy link
Owner

mike-fabian commented May 31, 2018

If you add this test case to m17n_translit.py:

mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)
$ git diff 
diff --git a/engine/m17n_translit.py b/engine/m17n_translit.py
index dea78a2..8dfab5c 100644
--- a/engine/m17n_translit.py
+++ b/engine/m17n_translit.py
@@ -257,6 +257,10 @@ class Transliterator:
     >>> trans.transliterate(['n', 'i', '3', 'h', 'a', 'o', '3'])
     '你好'
 
+    >>> trans = Transliterator('ko-romaja')
+    >>> trans.transliterate(list('annyeonghaseyo'))
+    '안녕하세요'
+
     If initializing the transliterator fails, for example
     because a non-existing input method was given as the argument,
     a ValueError is raised:
mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)

Does this work? You can test like this:

mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)
$ python3 m17n_translit.py 
mfabian@taka:/local/mfabian/src/ibus-typing-booster/engine (release-2.0.1 *$)

If you see no output from “python3 m17n_translit.py”, then it works.

@thierry-FreeBSD
Copy link
Contributor Author

I have upgraded libofx, m17n-db and m17n-lib to be sure that it's not the source of the problem, and the examples in m17n_translit.py fail, not only the Korean one. It produces NULL pointer access, see the attached log.
The problem seems located into m17n-lib, I'm going to check it.
m17n_translit.log

@mike-fabian
Copy link
Owner

Did you find anything? Does ibus-m17n work for you?

@thierry-FreeBSD
Copy link
Contributor Author

(Sorry for the delay...)
ibus-m17n seems working, at least for the languages that I can understand.
But I have noticed a strange thing: m17db installs 166 .mim files and 108 .lnm files, but ibus do not see most of them:

  • right click on the language switcher icon, and click preferences
  • go to the "Input Method" tab and click Add
  • the initial list only contains 7 languages
  • pressing the "..." button at the bottom shows some more categories of languages (33), but many are still missing!
    Among the missing ones, I cannot find anything for Korean.
    Do I need to install another package?

@mike-fabian
Copy link
Owner

Apparently you are using ibus-setup to look for the m17n input methods. That is OK
but it is probably easier to see what is available from the command line:

ibus list-engine

lists all engines ibus offers (same list you see in ibus-setup). But you can easily grep in the output.
For example:

$ ibus list-engine | grep m17n: | wc
    163     653    5917

shows that ibus offers 163 engines from ibus-m17n.

Korean is not among them:

$ ibus list-engine | grep m17n:ko
  m17n:kok:inscript2 - inscript2 (m17n)

The reason is that the m17n:ko:* engines are not considered useful as there is also ibus-hangul specialized for Korean:

$ /usr/libexec/ibus-engine-m17n --xml | grep rank | wc
ibus-m17n-Message: 09:40:33.779: skipped m17n:ja:anthy since its rank is lower than 0
ibus-m17n-Message: 09:40:33.785: skipped m17n:zh:py since its rank is lower than 0
ibus-m17n-Message: 09:40:33.785: skipped m17n:ru:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:he:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:ko:romaja since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:ko:han2 since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:sk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.786: skipped m17n:sr:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.787: skipped m17n:kk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.787: skipped m17n:hr:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:cmc:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:hy:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:uk:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.790: skipped m17n:el:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:lo:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:my:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:ug:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:cs:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.791: skipped m17n:ka:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.792: skipped m17n:uz:kbd since its rank is lower than 0
ibus-m17n-Message: 09:40:33.792: skipped m17n:be:kbd since its rank is lower than 0
    163     163    4401

These engines which rank lower than 0 are omitted on purpose because something
better seems to be available as a seperate engine (m17n:ja:anthy, m17n:zh:py, m17n:ko:romaja, ...)
or they are just simulation of keyboard layouts. For example m17n:cs:kbd simulates a Czech
keyboard layout on top of a US keyboard layout.

But m17n input methods which are not considered useful to offer via ibus-m17n might
still be useful for ibus-typing-booster. ibus-typing-booster cannot use ibus-hangul but it can
use /usr/share/m17n/ko-romaja.mim. And the keyboard layout simulations can be useful
with ibus-typing-booster as well, for example if you want to type Czech and English at the same
time with ibus-typing-booster, it makes sense to set a US English keyboard layout and add cs-kbd
in ibus-typing-booster so you can type both languages and get completions without having to
change keyboard layouts.

@mike-fabian
Copy link
Owner

You write: “ibus-m17n seems working, at least for the languages that I can understand.”

That makes it a bit mysterious why the examples in m17n_translit.py fail.

Could you find out more why these examples are failing?

@mike-fabian
Copy link
Owner

I checked on freebsd now and not all tests in m17n-translit.py fail for me.

Actually only those using inscript2 and the Korean one fail.

That the inscript2 tests fail is because you don’t have the inscript2 input methods for m17n
installed. They are included in the m17n-db package in Fedora, but apparently not
on most distributions. They are available here:

https://releases.pagure.org/inscript2/inscript2-20160423.tar.gz
$ tar xvf inscript2-20160423.tar.gz
x inscript2/
x inscript2/icons/
...

And then:

root@freebee:/usr/home/mfabian/inscript2 # cp  icons/* /usr/local/share/m17n/icons/
root@freebee:/usr/home/mfabian/inscript2 # cp IM/* /usr/local/share/m17n/

After doing that, only the Korean test still fails:

$ pwd
/usr/home/mfabian/ibus-typing-booster/engine
$ python3 m17n_translit.py
**********************************************************************
File "m17n_translit.py", line 261, in __main__.Transliterator
Failed example:
    trans.transliterate(list('annyeonghaseyo'))
Expected:
    '\uc548\ub155\ud558\uc138\uc694'
Got:
    '\uc548\ub155\uc138\u315b'
**********************************************************************
1 items had failures:
   1 of  31 in __main__.Transliterator
***Test Failed*** 1 failures.
$ 

@mike-fabian
Copy link
Owner

mike-fabian commented Jun 11, 2018

That failure of the Korean test on FreeBSD is quite weird:
Expected:
'\uc548\ub155\ud558\uc138\uc694'
Got:
'\uc548\ub155\uc138\u315b'

Translating the hex codes into the real characters, this is:

Expected: 안녕하세요
Got: 안녕세ㅛ

@mike-fabian
Copy link
Owner

ko-romaja-available-in-gnome-on-freebsd

As this screenshot shows, I do see the ko-romaja m17n input method in the gnome-control-center in FreeBSD.

@mike-fabian
Copy link
Owner

ko-romaja-from-ibus-m17n-does-not-work-correctly-on-freebsd

I added "Korean (romaja (m17n))" in the gnome-control-center, selected this
input method and typed "annyeonghaseyo" into gedit.

The result is the same as in the m17n_translit test case, i.e. one gets

안녕세ㅛ

instead of

안녕하세요

This seems clearly wrong, but as it is the same error when using ibus-m17n and
when trying to execute the m17n_translit.py test cases, I think it has nothing
to do with ibus-typing-booster. As the same error occurs when using ibus-m17n,
this looks like an error in m17n-lib and/or m17n-db.

@mike-fabian
Copy link
Owner

ibus-typing-booster-2 0 1-works-on-freebsd

I ignored the error in the Korean test case for the moment and tried instead whether I can
successfully install ibus-typing-booster 2.0.1 on FreeBSD and make it work.

I had to do two small fixes to make it work:

commit cde1d57ad70158b1e01f1a681f5d863a39bc7379
Author: Mike FABIAN [email protected]
Date: Mon Jun 11 17:39:58 2018 +0200

Fix some bugs in the usage of “prefix”

To make ./configure --prefix=... actually work for prefixes other than "usr".

I found that it didn’t work for --prefix=/usr/local which is used on FreeBSD.

commit eeeb2a7
Author: Mike FABIAN [email protected]
Date: Mon Jun 11 15:32:38 2018 +0200

Make itb_util.get_ime_help() work on FreeBSD

The .mim files are in '/usr/local/share/m17n' on FreeBSD, they
are in '/usr/share/m17n' on Fedora and openSUSE.

This will be in the 2.0.1 release.

@thierry-FreeBSD
Copy link
Contributor Author

Many points!

  1. Your message about ibus list-engine
    On my desktop, ibus list-engine only lists lines beginning with "xkb:", then ibus list-engine | grep m17n is empty.

Do I miss some configuration step?

  1. inscript2
    I did not know about this one! Your link is related to a Red Hat package, and the links in their README are dead; do you know if there is a homepage? I'm going to make an official port for FreeBSD, that will solve the problem with Hindi.

  2. About the Korean test
    Did you install m17n from the ports or the packages? ATM they install version 1.7.0.
    I have submitted a patch to upgrade them to 1.8.0
    (available at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228648 )
    but I'm waiting for maintainer's approval before committing them.
    And reading the changelog, it seems that 1.8.0 should fix some points (anyway, 1.8.0 is installed on my desktop, and the ITB tests still fail).

  3. About gnome-control-center
    ATM I have no Gnome installed, only KDE, and I am doing my tests with Ibus preferences. KDE Keyboard Settings offers 3 different ways for Hangul, but I do not know if they are related to m17n:
    capture_keyboard

  4. About the prefix / localbase issues
    They were already handled by the port, but thanks for that: it will simplify it!

Many thanks for your time and all these ideas. I was too busy to check it today, but I shall work on it again ASAP!

@mike-fabian
Copy link
Owner

You write: “On my desktop, ibus list-engine only lists lines beginning with "xkb:", then ibus list-engine | grep m17n is empty.”

I just did “pkg install m17n-lib”, “pkg install m17n-db” and “pkg install ibus-m17n”.

Then restarted the gnome session to restart ibus.

(“ibus restart” should also work and then there is no need to restart the Gnome session. But
“ibus restart” seemed not to work correctly on the FreeBSD version where I tried it, the input
into the gnome-terminal stopped working after “ibus restart” so I restarted the whole Gnome session instead).

@mike-fabian
Copy link
Owner

You write: “inscript2
I did not know about this one! Your link is related to a Red Hat package, and the links in their README are dead; do you know if there is a homepage? I'm going to make an official port for FreeBSD, that will solve the problem with Hindi.”

I don’t know of any other home page than:

https://releases.pagure.org/inscript2/

That work was done by Red Hat. I pinged the guy who did that today and asked him to
get it upstreamed so it will be included in the next m17n-db release.
Apparently he only did not yet upstream it because there were still problems with the inscript2
standard, apparently the Indian government was a bit slow in releasing the inscript2 standard.
But now it seems mature enough and he said there is no reason against upstreaming it and
he will do it soon.

In my ibus-typing-booster package for openSUSE, I just include that inscript2-20160423.tar.gz
tar ball, see:
https://build.opensuse.org/package/show/M17N/ibus-typing-booster
As soon as this is included in a future release of m17n-db, I’ll remove that.

Of course ibus-typing-booster still works without that, but you can use only inscript and not inscript2 then for Indian languages.

@mike-fabian
Copy link
Owner

You wrote: “Did you install m17n from the ports or the packages? ATM they install version 1.7.0.”

I used “pkg install m17n-lib”, “pkg install m17n-db”, and “pkg install ibus-m17n”.
This gave me the following versions:

ibus-m17n-1.3.4.16
m17n-db-1.7.0
m17n-lib-1.7.0_2

@mike-fabian
Copy link
Owner

What you write about the Korean settings in KDE looks like keyboard settings, it seems
related to whether you use a "real" Korean keyboard which is almost like the US English
keyboard but with some extra keys "Hanja/Hangul". If one uses a regular US English keyboard
to write Korean, that should work fine but you probably have to choose to use one of the
alternatives to the real "Hanja/Hangul" keys.

This has nothing to do with the "ko-romaja" input method from m17n.

@thierry-FreeBSD
Copy link
Contributor Author

Oops... I just noticed that after upgrading m17n-lib and m17n-db, I forgot to reinstall ibus-m17n!
Now the problem with ibus list-engine is solved.

@mike-fabian
Copy link
Owner

Could you find anything which causes the problem?
The Korean test case works fine for me on Fedora 28, so I guess this is a problem specific to FreeBSD.

@thierry-FreeBSD
Copy link
Contributor Author

Yes, it is surely specific to FreeBSD.
I have tried many things, without success for the moment.
Besides this test, everything seems OK, and ibus-typing-booster is working fine.
(sorry for the delay)

@mike-fabian
Copy link
Owner

The Korean test is the only one which is failing when doing

python3 m17n_translit.py

?

@thierry-FreeBSD
Copy link
Contributor Author

Its output is:

**********************************************************************
File "m17n_translit.py", line 261, in __main__.Transliterator
Failed example:
    trans.transliterate(list('annyeonghaseyo'))
Expected:
    '\uc548\ub155\ud558\uc138\uc694'
Got:
    '\uc548\ub155\uc138\u315b'
**********************************************************************
1 items had failures:
   1 of  31 in __main__.Transliterator
***Test Failed*** 1 failures.

@mike-fabian
Copy link
Owner

Did you find anything new here?

@thierry-FreeBSD
Copy link
Contributor Author

I've just upgraded ITB to 2.3.1, and launched the tests again. Meanwhile, many dependencies (ibus, etc.) have been upgraded.
But unfortunately, the Korean test still ends with the message:

======================================================================
FAIL: test_korean (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.3.1/tests/test_itb.py", line 407, in test_korean
    self.assertEqual(self.engine.mock_preedit_text, '안녕하세이')
AssertionError: '안녕세이' != '안녕하세이'
- 안녕세이
+ 안녕하세이
?   +


----------------------------------------------------------------------
Ran 20 tests in 183.269s

FAILED (failures=1)
FAIL run_tests (exit status: 1)

@mike-fabian mike-fabian added the bug in other software A bug in some other software having an effect on ibus-typing-booster label Dec 13, 2018
@thierry-FreeBSD
Copy link
Contributor Author

I just upgraded to 2.6.5, and the above error with Korean is still there, but there is also one more failure:

======================================================================
FAIL: test_accent_insensitive_matching_french_dictionary (test_itb.ItbTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/ports/textproc/ibus-typing-booster/work/ibus-typing-booster-2.6.5/tests/test_itb.py", line 577, in test_accent_insensitive_matching_french_dictionary
    'différemment')
AssertionError: 'différemment po:adv' != 'différemment'
- différemment po:adv
?             -------
+ différemment

@mike-fabian
Copy link
Owner

mike-fabian commented Aug 30, 2019 via email

@thierry-FreeBSD
Copy link
Contributor Author

OK, this is the reason: I don't have myspell, but hunspell, and

$ grep différemment /usr/local/share/hunspell/fr_FR.dic
différemment po:adv
indifféremment/D'Q' po:adv

@mike-fabian
Copy link
Owner

That looks like a bug the fr_FR.dic to me. Because each line in such a dictionary should
contain a word optionally followed by / and some flags used to generate additional inflected forms of that word. I am not sure what po:adv means, "adv" might mean "adverb". Anyway,
“différemment po:adv” doesn’t seem to be a word, the “po:adv” part should not be part of the word.
So it looks like the / which should separate the word from the extra information is missing on that line.

@thierry-FreeBSD
Copy link
Contributor Author

This is a part of hunspell's specification: see p. 9 and 10 of https://grammalecte.net/_misc/hunspell4.pdf

@mike-fabian
Copy link
Owner

You are right, I’ll make the following change in the 2.6.6 release
to make it work correctly with the newer French dictionaries:

diff --git a/engine/itb_util.py b/engine/itb_util.py
index 500471a..14d3832 100755
--- a/engine/itb_util.py
+++ b/engine/itb_util.py
@@ -2986,7 +2986,6 @@ def find_hunspell_dictionary(language):
     '''
     Find the hunspell dictionary file for a language
 
-
     :param language: The language of the dictionary to search for
     :type language: String
     :rtype: tuple of the form (dic_path, aff_path) where
@@ -3136,14 +3135,30 @@ def get_hunspell_dictionary_wordlist(language):
     # différemment     8
     # différence/1     2
     #
-    # Therefore, remove everthing following a '/' or a tab from a line
-    # to make the memory use of the word list a bit smaller and the
-    # regular expressions we use later to match words in the
+    # Newer French dictionaries downloaded from
+    #
+    # http://grammalecte.net/download/fr/hunspell-french-dictionaries-v6.4.1.zip
+    #
+    # even contain stuff like:
+    #
+    # différemment po:adv
+    # différence/S.() po:nom is:fem
+    #
+    # i.e. the separator between the word and the extra stuff
+    # can be a space instead of a tab.
+    #
+    # As far as I know, hunspell dictionaries never contain whitespace
+    # within the words themselves.
+    #
+    # Therefore, remove everything following a '/', ' ', or a tab from
+    # a line to make the memory use of the word list a bit smaller and
+    # the regular expressions we use later to match words in the
     # dictionary slightly simpler and maybe a tiny bit faster:
+    #
     word_list = [
         unicodedata.normalize(
             NORMALIZATION_FORM_INTERNAL,
-            re.sub(r'[/\t].*', '', x.replace('\n', '')))
+            re.sub(r'[/\t ].*', '', x.replace('\n', '')))
         for x in dic_buffer
     ]
     return (dic_path, dictionary_encoding, word_list)

mike-fabian added a commit that referenced this issue Sep 11, 2019
Newer French dictionaries downloaded from

http://grammalecte.net/download/fr/hunspell-french-dictionaries-v6.4.1.zip

contain spaces to separate words from extra information.

Improve the regexp to get the words correctly from these dictioniaries
as well.

These new French dictionaries seem to be available already on  FreeBSD,
see:

#38 (comment)
@mike-fabian
Copy link
Owner

The fix for the new French dictionaries is included here:

https://github.com/mike-fabian/ibus-typing-booster/releases/tag/2.6.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug in other software A bug in some other software having an effect on ibus-typing-booster
Projects
None yet
Development

No branches or pull requests

2 participants