[BUG] wrong flag for emoji #655

dseomn · 2025-02-22T02:29:18Z

ibus-typing-booster version

2.27.27-1

Distribution and Version

Debian testing/unstable

Desktop Environment and Version

GNOME 47.3-1

Session Type

Wayland
X11

Application and Version

ptyxis 47.8-1

Summary of the bug

With dictionary='en_US,nl_NL', I get the US flag for Dutch emoji:

How to reproduce the bug?

Use en_US and nl_NL dictionaries.
Type grappig to search for emoji in Dutch.

Always reproducible?

Yes
No

Which Typing Booster options/settings do you use?

[/]
addspaceoncommit=false
autosettings=[['prefercommit', 'false', '^SDL2_Application:']]
candidatesdelaymilliseconds=uint32 0
dictionary='en_US,nl_NL'
emojipredictions=true
emojitriggercharacters=''
flagdictionary=true
inputmethod='t-rfc1345-plus'
keybindings={'commit': <['Return']>, 'commit_and_forward_key': <['Left']>, 'commit_candidate_1': <['KP_1', 'F1']>, 'commit_candidate_1_plus_space': <@as []>, 'commit_candidate_2': <['KP_2', 'F2']>, 'commit_candidate_2_plus_space': <@as []>, 'commit_candidate_3': <['KP_3', 'F3']>, 'commit_candidate_3_plus_space': <@as []>, 'commit_candidate_4': <['KP_4', 'F4']>, 'commit_candidate_4_plus_space': <@as []>, 'commit_candidate_5': <['KP_5', 'F5']>, 'commit_candidate_5_plus_space': <@as []>, 'commit_candidate_6': <['KP_6', 'F6']>, 'commit_candidate_6_plus_space': <@as []>, 'commit_candidate_7': <['KP_7', 'F7']>, 'commit_candidate_7_plus_space': <@as []>, 'commit_candidate_8': <['KP_8', 'F8']>, 'commit_candidate_8_plus_space': <@as []>, 'commit_candidate_9': <['KP_9', 'F9']>, 'commit_candidate_9_plus_space': <@as []>, 'toggle_emoji_prediction': <['Shift+Control+E']>, 'toggle_off_the_record': <@as []>}
offtherecord=true
pagesize=9
recordmode=3
shownumberofcandidates=true
wordpredictions=false

Anything else?

No response

The text was updated successfully, but these errors were encountered:

mike-fabian · 2025-02-22T10:15:44Z

I cannot reproduce this ☹.

I think matches in the emoji data should not display dictionary flags at all:

I think the relevant settings were the same than yours when I created this screenshot.

https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/hunspell_table.py#L1130

    def _append_candidate_to_lookup_table(
            self, phrase: str = '',
            user_freq: int = 0,
            comment: str = '',
            from_user_db: bool = False,
            spell_checking: bool = False) -> None:
        '''append candidate to lookup_table'''
        if not phrase:
            return
        phrase = itb_util.normalize_nfc_and_composition_exclusions(phrase)
        dictionary_matches: List[str] = (
            self.database.hunspell_obj.spellcheck_match_list(phrase))
        [...]
        if dictionary_matches:
            [...]
        if self._flag_dictionary:
            [...]
            for dictionary in dictionary_matches:
                    phrase += self._dictionary_flags.get(dictionary, '')

So if dictionary_matches is empty, no flags should be appended.

And for Emojj, there are usually no matches in any dictionaries.

Is the following different on your system?:

(I am doing this in /usr/share/ibus-typing-booster/engine to be able to do import hunspell_suggest)

mfabian@f41:/usr/share/ibus-typing-booster/engine$ python
Python 3.13.2 (main, Feb  4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hunspell_suggest
>>> hunspell_suggest.IMPORT_HUNSPELL_SUCCESSFUL
False
>>> hunspell_suggest.IMPORT_ENCHANT_SUCCESSFUL
True
>>> h = hunspell_suggest.Hunspell(['en_US', 'nl_NL'])
>>> h.spellcheck_match_list('💩')
[]

So 💩 is not found neither in the en_US nor the nl_NL dictionary.

I thought that maybe you are using python3-pyhunspell instead of python3-enchant. Typing Booster prefers python3-enchant but falls back to python3-pyhunspell if python3-enchant is not available, near the beginnig of /usr/share/ibus-typing-booster/engine/hunspell_suggest.py there is

IMPORT_ENCHANT_SUCCESSFUL = False
IMPORT_HUNSPELL_SUCCESSFUL = False
try:
    import enchant # type: ignore
    IMPORT_ENCHANT_SUCCESSFUL = True
except (ImportError,):
    try:
        import hunspell # type: ignore
        IMPORT_HUNSPELL_SUCCESSFUL = True
    except (ImportError,):
        pass

And depending on what could be imported there the following code uses python3-enchant or python3-hunspell.

But I tried with both now and it makes no difference, in both cases I get no match for 💩 neither in the en_US nor the nl_NL dictionary.

mike-fabian · 2025-02-22T14:31:41Z

Really weird, I tried on Debian testing/unstable now and cannot reproduce it there either:

mfabian@debian-testing:~$ dconf dump /org/freedesktop/ibus/engine/typing-booster/
[/]
addspaceoncommit=true
dictionary='nl_NL,en_US'
emojipredictions=true
inputmethod='t-rfc1345'
shownumberofcandidates=true
wordpredictions=false
mfabian@debian-testing:~$

mike-fabian · 2025-02-22T14:33:39Z

I forgot the flagdictionary=true setting:

mfabian@debian-testing:~$ dconf dump /org/freedesktop/ibus/engine/typing-booster/
[/]
addspaceoncommit=true
dictionary='nl_NL,en_US'
emojipredictions=true
flagdictionary=true
inputmethod='t-rfc1345'
shownumberofcandidates=true
wordpredictions=false
mfabian@debian-testing:~$

Now I can reproduce it:

mike-fabian · 2025-02-22T14:34:55Z

fabian@debian-testing:/usr/share/ibus-typing-booster/engine$ python3
Python 3.13.2 (main, Feb  5 2025, 01:23:35) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hunspell_suggest
>>> hunspell_suggest.IMPORT_HUNSPELL_SUCCESSFUL
False
>>> hunspell_suggest.IMPORT_ENCHANT_SUCCESSFUL
True
>>> h = hunspell_suggest.Hunspell(['en_US', 'nl_NL'])
>>> h.spellcheck_match_list('💩')
['en_US']
>>>

mike-fabian · 2025-02-22T14:58:31Z

Debian testing:

mfabian@debian-testing:~$ python3
Python 3.13.2 (main, Feb  5 2025, 01:23:35) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> d = enchant.Dict('en_US')
>>> d.check('💩')
True
>>>

Fedora 41:

mfabian@f41:~$ python3
Python 3.13.2 (main, Feb  4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> d = enchant.Dict('en_US')
>>> d.check('💩')
False
>>>

Hm, what does that mean?

mike-fabian · 2025-02-22T16:02:36Z

I think for emoji, it makes no sense to show the dictionary flag. The dictionary flags are mostly useful if you write in more than one language at the same time to see which matches are valid words in which language. Something like this:

Here I can see that “arrive” is a valid word in both French and English but “arriver” is only valid in French.

mike-fabian · 2025-02-22T16:05:29Z

For emoji, the Flags seem to make no sense anyway, emoji are valid in any language.

So I should probably just omit the flags if or candidates which are emoji.

Thinking about how I could do a check for "is this candidate an emoji?" fast without causing performance issues in filling the lookup table.

…ve comments Resolves: #655 If a candidate in a lookup table has a comment, then it must be - a emoji - a single “unusual” charactor or symbol - a related word found by itb_nltk.py In all these cases, it is not interesting to apply labels to the lookup table for such candidates to show whether a spellcheck against some language specific dictionaries returns True or False. On some systems, something like 💩 might pass a spellcheck, for example on Debian: mfabian@debian-testing:~$ python3 Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict('en_US') >>> d.check('💩') True >>> But that doesn’t mean it is useful to append a 🇺🇸 label to a 💩 to indicate that 💩 is a valid word in the en_US dictionary.

mike-fabian · 2025-02-22T16:49:53Z

With the patch ae92568 applied it looks like this on Debian testing:

mike-fabian · 2025-02-22T16:51:14Z

In the last of the 3 screenshots in the previous comment one can see that the flags are now omittted for the 💩 emoji but not for the “normal” words which match in the nl_NL and/or en_US dictionaries.

dseomn · 2025-02-22T18:25:09Z

I think I mostly figured out the difference between distros. What does this show on Fedora? I'm guessing it's not Aspell?

In [41]: import enchant

In [42]: enchant.Dict('en_US').provider
Out[42]: <Enchant: Aspell Provider>

From http://aspell.net/0.50-doc/man-html/4_Customizing.html#SECTION00522000000000000000

ignore,-W
(integer) ignore words <= n chars

I found that by digging through the code first, and https://github.com/GNUAspell/aspell/blob/4295413512cb1ceeba741876d12612e74c77f14b/modules/speller/default/speller_impl.cpp#L141 is what stood out to me as possibly causing this difference. But that uses a char which I wouldn't expect to work with an emoji in utf-8. Maybe my system is using a different implementation though, not the one in modules/speller/default/speller_impl.cpp, and maybe that other implementation treats ignore as a number of code points instead of bytes. Just to try out my theory a bit more:

In [49]: d = enchant.Dict('en_US')

In [50]: d.check('z')
Out[50]: True

In [51]: d.check('ñ')
Out[51]: True

I think your patch to just not show flags for emoji makes sense though, I was just curious what was going on with the dictionaries.

dseomn · 2025-02-22T18:27:47Z

Oops, forgot to show why I think ignore is set to 1 on my system:

In [52]: d.check('ña')
Out[52]: False

In [53]: d.check('zz')
Out[53]: False

mike-fabian · 2025-02-22T18:37:28Z

On Fedora:

mfabian@f41:~$ python3
Python 3.13.2 (main, Feb  4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> enchant.Dict('en_US').provider
<Enchant: Hunspell Provider>
>>> d = enchant.Dict('en_US')
>>> d.check('z')
True
>>> d.check('ñ')
False
>>> d.check('ña')
False
>>> d.check('zz')
False
>>>

mike-fabian · 2025-02-22T18:43:41Z

The thing with using flags for languages has an obvious shortcoming though, there is no one -to-one mapping between languages and country flags.

India for example has a lot of languages but only one flag. To produce something unique if several languages share the same flag I have this helper function producing unique labels for the list of dictionaries used:

https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/itb_util.py#L2358C1-L2382C16

def get_flags(dictionaries: List[str]) -> Dict[str, str]:
    # pylint: disable=line-too-long
    '''
    Examples:

    >>> get_flags(['de_DE', 'fr_FR', 'eo'])
    {'de_DE': '🇩🇪', 'fr_FR': '🇫🇷', 'eo': '🌍'}
    >>> get_flags(['fr_FR', 'de_DE', 'fy_DE', 'eo', 'de', '150'])
    {'fr_FR': '🇫🇷fr_FR', 'de_DE': '🇩🇪de_DE', 'fy_DE': '🇩🇪fy_DE', 'eo': '🌍eo', 'de': '🌍de', '150': '🌍150'}
    '''
    # pylint: enable=line-too-long
    flags: Dict[str, str] = {}
    flags_seen: Set[str] = set()
    duplicate_flags = False
    for dictionary in dictionaries:
        new_flag = get_flag(dictionary)
        flags[dictionary] = new_flag
        if new_flag in flags_seen:
            duplicate_flags = True
        flags_seen.add(new_flag)
    if duplicate_flags:
        for key, flag in flags.items():
            if not flag.endswith(key):
                flags[key] += key
    return flags

mike-fabian · 2025-02-22T18:51:37Z

And if the results of using

☑️ Use flags for dictionary suggestions

are still not satisfactory with the unique results produced by get_flags(), then one can use something like:

☑️ Use label for dictionary suggestions [ {'*': '📖', 'fy_??': '🛟', 'de_DE': '🇩🇪', 'en_GB': '💂🏻', 'fr_FR': '🗼'} ]
☐ Use flags for dictionary suggestions

The value for the label used for dictionary suggestions can be a simple string but it can also be a Python dictionary specifying exactly which symbols to use for which dictionary.

{'*': '📖', 'fy_??': '🛟', 'de_DE': '🇩🇪', 'en_GB': '💂🏻', 'fr_FR': '🗼'} would use '🇩🇪' for the de_DE dictionary and '🛟' for the fy_NL and fy_DE dictionaries. '*': '📖' is the fallback symbol if non of the more specific dictionary glob patterns matches.

dseomn · 2025-02-22T19:23:55Z

The thing with using flags for languages has an obvious shortcoming though, there is no one -to-one mapping between languages and country flags.

Totally non-serious idea that you definitely should not implement: Just pretend the regional indicator symbols used for flags can actually be used for language codes instead. So Swiss German would be the completely sensible 🇬🇸🇼.

…ve comments Resolves: #655 If a candidate in a lookup table has a comment, then it must be - an emoji - a single “unusual” charactor or symbol - a related word found by itb_nltk.py In all these cases, it is not interesting to apply labels to the lookup table for such candidates to show whether a spellcheck against some language specific dictionaries returns True or False. On some systems, something like 💩 might pass a spellcheck, for example on Debian: mfabian@debian-testing:~$ python3 Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict('en_US') >>> d.check('💩') True >>> But that doesn’t mean it is useful to append a 🇺🇸 label to a 💩 to indicate that 💩 is a valid word in the en_US dictionary.

dseomn added bug triage labels Feb 22, 2025

dseomn assigned mike-fabian Feb 22, 2025

mike-fabian removed the triage label Feb 22, 2025

mike-fabian added this to Mike’s project Feb 22, 2025

mike-fabian moved this to In Progress in Mike’s project Feb 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] wrong flag for emoji #655

[BUG] wrong flag for emoji #655

dseomn commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

dseomn commented Feb 22, 2025 •

edited

Loading

dseomn commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025 •

edited

Loading

dseomn commented Feb 22, 2025

[BUG] wrong flag for emoji #655

[BUG] wrong flag for emoji #655

Comments

dseomn commented Feb 22, 2025

ibus-typing-booster version

Distribution and Version

Desktop Environment and Version

Session Type

Application and Version

Summary of the bug

How to reproduce the bug?

Always reproducible?

Which Typing Booster options/settings do you use?

Anything else?

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

dseomn commented Feb 22, 2025 • edited Loading

dseomn commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025

mike-fabian commented Feb 22, 2025 • edited Loading

dseomn commented Feb 22, 2025

dseomn commented Feb 22, 2025 •

edited

Loading

mike-fabian commented Feb 22, 2025 •

edited

Loading