Skip to content

Sparse Indexing of my Arabic Collection results in (Latin_1) encoding instead of (utf_8). #1316

Answered by lintool
MedSao asked this question in Q&A
Discussion options

You must be logged in to vote

What's your indexing command? Did you set --language ar?

We have had any issues indexing Arabic text either in Mr.TyDi or MIRACL:
https://github.com/castorini/pyserini/#two-click-reproductions

What issue are you seeing?

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@MedSao
Comment options

@lintool
Comment options

@MedSao
Comment options

@lintool
Comment options

@MedSao
Comment options

Answer selected by MedSao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants