You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it might be good to move the version to v1.0.0 soon, but I think it might be good to have an issue open for any discussion. There are several things that I think probably should be done before that happens:
Figure out a more generic saving/loading scheme that can be extended for different language models besides just the provided LanguageModel class. See this issue.
Remove explicit dependence on kenlm in the AbstractLanguageModel and Decoder classes. See this issue
Make sure the documentation and notebooks are fully up to date
(Maybe) Refactor so that the kenlm classes are contained in their own file instead of in the main language model and decoder files. This would break imports since anything mentioning kenlm would now be in a different module.
(Maybe) Add an abstract decoder class to allow for extending with alternate decoder classes? The most basic API would just require a decode() and a decode_batch() function but decode_beams() and decode_beams_batch() might be useful for beam-search decoders
(Maybe) There have been some requests for including per-word scores in the output. Settling on a way to do that might be another good feature improvement to aim for.
The text was updated successfully, but these errors were encountered:
On Point 6:
Just like how time-stamps are being calculated for each word by keeping two variables "frame_list" and "frames", in a similar fashion we can have two more variables "word_confidence_list" and "word_confidence", and we can update them in a way similar to how we update time stamps. However, unlike timestamps, we will have to make changes in _merge_beams function to merge the word confidence scores as well, just like how logit scores are merged.
Is that correct @lopez86 ? I have never contributed to any open source project on GitHub, it'd be great if I can contribute on this word confidence feature.
I think it might be good to move the version to v1.0.0 soon, but I think it might be good to have an issue open for any discussion. There are several things that I think probably should be done before that happens:
decode()
and adecode_batch()
function butdecode_beams()
anddecode_beams_batch()
might be useful for beam-search decodersThe text was updated successfully, but these errors were encountered: