Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the code of Persian language? #33

Open
hatefap opened this issue Mar 29, 2021 · 4 comments
Open

what is the code of Persian language? #33

hatefap opened this issue Mar 29, 2021 · 4 comments

Comments

@hatefap
Copy link

hatefap commented Mar 29, 2021

hey, did this support Persian/Farsi language? what is its code to pass into this function:

embeddings = laser.embed_sentences(
    ['let your neural network be polyglot',
     'use multilingual embeddings!'],
    lang='en')
@tamohannes
Copy link

@hatefap should be fa, but LASER is not trained on a Persian/Farsi corpus, so it will automatically fall back on en.

@hoschwenk
Copy link

LASER supports Persian/Farsi.
You may see a message "falling back to English", but this is only comes from punctuation normalization sicne there are no specific rules for Farsi/Persian.
You simply ignore it.

@tamohannes
Copy link

@hoschwenk thanks, it's strange though, couldn't find the fa on the paper in the Table 1

Artetxe, M. and Schwenk, H., 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7, pp.597-610.

@hatefap
Copy link
Author

hatefap commented May 16, 2021

@HovhannesTamoyan @hoschwenk thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants