Skip to content

Commit

Permalink
Improve lang validity check (#275)
Browse files Browse the repository at this point in the history
* Improve lang validity check

The list returned by getISOLanguages does not include deprecated
language codes that are still accepted to create locales.

* Use a one-liner
  • Loading branch information
guillaumekln authored Jan 25, 2022
1 parent 56158de commit f25f034
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 8 deletions.
4 changes: 4 additions & 0 deletions bindings/python/test/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ def test_invalid_lang():
pyonmttok.Tokenizer("conservative", lang="xxx")


def test_deprecated_lang():
pyonmttok.Tokenizer("conservative", lang="tl")


def test_invalid_sentencepiece_model():
with pytest.raises(ValueError):
pyonmttok.Tokenizer("none", sp_model_path="xxx")
Expand Down
9 changes: 1 addition & 8 deletions src/unicode/Unicode.cc
Original file line number Diff line number Diff line change
Expand Up @@ -235,14 +235,7 @@ namespace onmt

bool is_valid_language(const char* language)
{
for (const char* const* available_languages = icu::Locale::getISOLanguages();
*available_languages;
++available_languages)
{
if (strcmp(*available_languages, language) == 0)
return true;
}
return false;
return icu::Locale(language).getISO3Language()[0] != '\0';
}

// The functions below are made backward compatible with the Kangxi and Kanbun script names
Expand Down

0 comments on commit f25f034

Please sign in to comment.