-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] wrong flag for emoji #655
Comments
I cannot reproduce this ☹. I think matches in the emoji data should not display dictionary flags at all: I think the relevant settings were the same than yours when I created this screenshot. https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/hunspell_table.py#L1130 def _append_candidate_to_lookup_table(
self, phrase: str = '',
user_freq: int = 0,
comment: str = '',
from_user_db: bool = False,
spell_checking: bool = False) -> None:
'''append candidate to lookup_table'''
if not phrase:
return
phrase = itb_util.normalize_nfc_and_composition_exclusions(phrase)
dictionary_matches: List[str] = (
self.database.hunspell_obj.spellcheck_match_list(phrase))
[...]
if dictionary_matches:
[...]
if self._flag_dictionary:
[...]
for dictionary in dictionary_matches:
phrase += self._dictionary_flags.get(dictionary, '') So if dictionary_matches is empty, no flags should be appended. And for Emojj, there are usually no matches in any dictionaries. Is the following different on your system?: (I am doing this in mfabian@f41:/usr/share/ibus-typing-booster/engine$ python
Python 3.13.2 (main, Feb 4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hunspell_suggest
>>> hunspell_suggest.IMPORT_HUNSPELL_SUCCESSFUL
False
>>> hunspell_suggest.IMPORT_ENCHANT_SUCCESSFUL
True
>>> h = hunspell_suggest.Hunspell(['en_US', 'nl_NL'])
>>> h.spellcheck_match_list('💩')
[] So 💩 is not found neither in the en_US nor the nl_NL dictionary. I thought that maybe you are using IMPORT_ENCHANT_SUCCESSFUL = False
IMPORT_HUNSPELL_SUCCESSFUL = False
try:
import enchant # type: ignore
IMPORT_ENCHANT_SUCCESSFUL = True
except (ImportError,):
try:
import hunspell # type: ignore
IMPORT_HUNSPELL_SUCCESSFUL = True
except (ImportError,):
pass And depending on what could be imported there the following code uses python3-enchant or python3-hunspell. But I tried with both now and it makes no difference, in both cases I get no match for 💩 neither in the en_US nor the nl_NL dictionary. |
Really weird, I tried on Debian testing/unstable now and cannot reproduce it there either:
|
I forgot the
Now I can reproduce it: |
fabian@debian-testing:/usr/share/ibus-typing-booster/engine$ python3
Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hunspell_suggest
>>> hunspell_suggest.IMPORT_HUNSPELL_SUCCESSFUL
False
>>> hunspell_suggest.IMPORT_ENCHANT_SUCCESSFUL
True
>>> h = hunspell_suggest.Hunspell(['en_US', 'nl_NL'])
>>> h.spellcheck_match_list('💩')
['en_US']
>>> |
Debian testing: mfabian@debian-testing:~$ python3
Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> d = enchant.Dict('en_US')
>>> d.check('💩')
True
>>> Fedora 41: mfabian@f41:~$ python3
Python 3.13.2 (main, Feb 4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> d = enchant.Dict('en_US')
>>> d.check('💩')
False
>>> Hm, what does that mean? |
I think for emoji, it makes no sense to show the dictionary flag. The dictionary flags are mostly useful if you write in more than one language at the same time to see which matches are valid words in which language. Something like this: Here I can see that “arrive” is a valid word in both French and English but “arriver” is only valid in French. |
For emoji, the Flags seem to make no sense anyway, emoji are valid in any language. So I should probably just omit the flags if or candidates which are emoji. Thinking about how I could do a check for "is this candidate an emoji?" fast without causing performance issues in filling the lookup table. |
…ve comments Resolves: #655 If a candidate in a lookup table has a comment, then it must be - a emoji - a single “unusual” charactor or symbol - a related word found by itb_nltk.py In all these cases, it is not interesting to apply labels to the lookup table for such candidates to show whether a spellcheck against some language specific dictionaries returns True or False. On some systems, something like 💩 might pass a spellcheck, for example on Debian: mfabian@debian-testing:~$ python3 Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict('en_US') >>> d.check('💩') True >>> But that doesn’t mean it is useful to append a 🇺🇸 label to a 💩 to indicate that 💩 is a valid word in the en_US dictionary.
With the patch ae92568 applied it looks like this on Debian testing: |
In the last of the 3 screenshots in the previous comment one can see that the flags are now omittted for the 💩 emoji but not for the “normal” words which match in the |
I think I mostly figured out the difference between distros. What does this show on Fedora? I'm guessing it's not Aspell?
From http://aspell.net/0.50-doc/man-html/4_Customizing.html#SECTION00522000000000000000
I found that by digging through the code first, and https://github.com/GNUAspell/aspell/blob/4295413512cb1ceeba741876d12612e74c77f14b/modules/speller/default/speller_impl.cpp#L141 is what stood out to me as possibly causing this difference. But that uses a
I think your patch to just not show flags for emoji makes sense though, I was just curious what was going on with the dictionaries. |
Oops, forgot to show why I think
|
On Fedora: mfabian@f41:~$ python3
Python 3.13.2 (main, Feb 4 2025, 00:00:00) [GCC 14.2.1 20250110 (Red Hat 14.2.1-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import enchant
>>> enchant.Dict('en_US').provider
<Enchant: Hunspell Provider>
>>> d = enchant.Dict('en_US')
>>> d.check('z')
True
>>> d.check('ñ')
False
>>> d.check('ña')
False
>>> d.check('zz')
False
>>> |
The thing with using flags for languages has an obvious shortcoming though, there is no one -to-one mapping between languages and country flags. India for example has a lot of languages but only one flag. To produce something unique if several languages share the same flag I have this helper function producing unique labels for the list of dictionaries used: https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/itb_util.py#L2358C1-L2382C16
|
And if the results of using ☑️ Use flags for dictionary suggestions are still not satisfactory with the unique results produced by ☑️ Use label for dictionary suggestions [ {'*': '📖', 'fy_??': '🛟', 'de_DE': '🇩🇪', 'en_GB': '💂🏻', 'fr_FR': '🗼'} ] The value for the label used for dictionary suggestions can be a simple string but it can also be a Python dictionary specifying exactly which symbols to use for which dictionary.
|
Totally non-serious idea that you definitely should not implement: Just pretend the regional indicator symbols used for flags can actually be used for language codes instead. So Swiss German would be the completely sensible 🇬🇸🇼. |
…ve comments Resolves: #655 If a candidate in a lookup table has a comment, then it must be - an emoji - a single “unusual” charactor or symbol - a related word found by itb_nltk.py In all these cases, it is not interesting to apply labels to the lookup table for such candidates to show whether a spellcheck against some language specific dictionaries returns True or False. On some systems, something like 💩 might pass a spellcheck, for example on Debian: mfabian@debian-testing:~$ python3 Python 3.13.2 (main, Feb 5 2025, 01:23:35) [GCC 14.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import enchant >>> d = enchant.Dict('en_US') >>> d.check('💩') True >>> But that doesn’t mean it is useful to append a 🇺🇸 label to a 💩 to indicate that 💩 is a valid word in the en_US dictionary.
ibus-typing-booster version
2.27.27-1
Distribution and Version
Debian testing/unstable
Desktop Environment and Version
GNOME 47.3-1
Session Type
Application and Version
ptyxis 47.8-1
Summary of the bug
With
dictionary='en_US,nl_NL'
, I get the US flag for Dutch emoji:How to reproduce the bug?
grappig
to search for emoji in Dutch.Always reproducible?
Which Typing Booster options/settings do you use?
Anything else?
No response
The text was updated successfully, but these errors were encountered: