-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greek transliteration is non deterministic #47
Comments
Thanks for bringing this up. There have been numerous attempts and PRs to bring corectness to Greek transliteration. I'm all open for correctness and thus willing to accept a valid PR. I think back in the day, I have used this Wikipedia article as a valid and trustworthy source of information on the topic. Could you please double check your findings with the mentioned Wikipedia article and let me know if current interpretation of Thank you! |
I am unable to reproduce this on master (9333f24) and python 3.9.2
with foo.py containing
This isn't easy to reproduce right now (which isn't surprising, 3 years have passed since 2019) Judging from the report, I would say that we no longer are able to reproduce this cause starting with cpython 3.5 and finalized in the python spec in 3.7, standard dictionary objects preserve order. Given the following stanza in the
it makes sense that the dictionaries are initialized with different orders on subsequent executions in python version pre 3.5. I 'd say that this explains the inconsistent behavior. It also means that by now it has become extremely rare and will only show up when using older and unsupported python versions. However, the transliteration in the example above is just wrong. I am not sure where the 2nd mapping comes from but it should not be there. @barseghyanartur I 'll submit a PR to remove the [1] https://en.wikipedia.org/wiki/ISO_843 |
"ου" in both ISO 843[1], the international ratification of ELOT 743 v1 with a couple of minor differences, and ELOT 743 version 2 type 1 [2] (the Greek cross ratification of ISO 843 to adopt the above minor differences) specifically set an exception for the double vowel "ου", which needs to be transliterated as "ou" and vice versa. There is no mapping exception to/from "oy", so while "oy" would be transliterated, per the general rules, to "ου" the inverse would never be true in a transliteration context. It's important to note that nor the UN nor the ALA-LC (library of congress) treat "ου" differently than ISO-843/ELOT 743 v2 (which isn't the case for some other mappings). This closes barseghyanartur#47 Signed-off-by: Alexandros Kosiaris <[email protected]>
Transliteration of Greek is non deterministic !
Running translit('Δεν του μίλησα ξανά.', 'el', reversed=True) several times
Gives "den toy milisa xana."
or "den tou milisa xana."
Maybe both are correct but the tool should always output the same one !
If not, results are not reproducible, e.g. when used in a machine translation system.
This happens if you start python3 several times. not when called in a loop
The text was updated successfully, but these errors were encountered: