You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
There are multiple fasttext models available, and in principle, one could train their own.
Besides the one indicated by the README.md (lid.176.bin), the official page lists lid.176.ftz
On huggingface there is lid218eavailable and
there is also a recent independent lib201 model
Describe the solution you'd like
I would like the README.md to mention that there are other models available
I would like the code to provide a way to select a model through configuration.
I would like the README.md to reflect how 2. would be implemented
Describe alternatives you've considered
I can download my own model and rename it into lid.176.bin. This is prone to confusion and unsatisfactory
Additional context
There seems to be some unexposed options to achieve this.
It would be useful to modify them into a modular fashion and document it.
The code model.rs also seems to default to Path::new("lid.176.bin") but when absents tries to default to lid.208a.bin? which is unclear where it is obtainable. It's obvious certain efforts were made behind the curtain so I am hesitant to implment a solution on my own.
I would like the README.md to mention that there are other models available
I agree, we need to add some info about that, esp. since there are way more fasttext models now.
I would like the code to provide a way to select a model through configuration.
Is using ungoliant pipeline ... --lid-path <path to lid> not sufficient? I have used other fasttext models this way and it works!
I would like the README.md to reflect how 2. would be implemented
This can also be added to the readme.
The code model.rs also seems to default to Path::new("lid.176.bin") but when absents tries to default to lid.208a.bin? which is unclear where it is obtainable. It's obvious certain efforts were made behind the curtain so I am hesitant to implment a solution on my own.
This is a weird behaviour and it should be fixed. I'll replace lid.218a.bin by lid.176.bin, since it's readily available.
Is your feature request related to a problem? Please describe.
There are multiple fasttext models available, and in principle, one could train their own.
Besides the one indicated by the README.md (lid.176.bin), the official page lists lid.176.ftz
On huggingface there is lid218eavailable and
there is also a recent independent lib201 model
Describe the solution you'd like
Describe alternatives you've considered
I can download my own model and rename it into lid.176.bin. This is prone to confusion and unsatisfactory
Additional context
It would be useful to modify them into a modular fashion and document it.
Path::new("lid.176.bin")
but when absents tries to default tolid.208a.bin
? which is unclear where it is obtainable. It's obvious certain efforts were made behind the curtain so I am hesitant to implment a solution on my own.[Feature request] Train a classifier to better classify languages #21 might be fixed with this change.
The text was updated successfully, but these errors were encountered: