Other Segmentation Tools #202
Answered
by
WissamAntoun
e-hossam96
asked this question in
Q&A
-
Hello all, Thanks for the good effort. Is it OK to use other Disambiguation/Segmentation tools like Camel-Tools to preprocess input text before passing it to the AraBERTv2 tokenizer? Thanks |
Beta Was this translation helpful? Give feedback.
Answered by
WissamAntoun
Sep 14, 2024
Replies: 1 comment 1 reply
-
Hey Hossam, I haven't tried replacing the pre-processor yet. In any case, I'd recommend against it, since even if the Camel Tool produces the same prefixes or suffixes if the tokenization of words let's say is more accurate, it will be different than what was seen during pretraining, and hence the performance might be slightly worst and not optimal. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
e-hossam96
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey Hossam, I haven't tried replacing the pre-processor yet. In any case, I'd recommend against it, since even if the Camel Tool produces the same prefixes or suffixes if the tokenization of words let's say is more accurate, it will be different than what was seen during pretraining, and hence the performance might be slightly worst and not optimal.