Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for handling multilingual input #200

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jsichi
Copy link
Contributor

@jsichi jsichi commented Apr 8, 2024

This is an incomplete PR intended to start on addressing issues semi-related to #184.

The multilingual_input option controls whether multiple languages should be expected in the input stream. If False (the backwards compatible default), only one language is expected, and it will be either the one specified by the client, or the first one heard if none was specified by the client. If True, the language can change throughout the stream, and for transcription, this will result in a multilingual text. Notifications will be sent to the client whenever a language change is detected. If the pauses between utterances in different languages are not long enough, the transcript boundaries may be incorrect, i.e. the first sentence in the new language may be incorrectly transcribed in the previous language. This seems currently unavoidable due to the way the last work-in-progress segment gets reprocessed.

The lang_filter option allows the client to restrict the candidate set of languages for which to listen. This may be useful regardless of the multilingual_input setting, e.g. at the beginning of the input where the actual language may be incorrectly detected initially. If not set (the backwards compatible default), all known languages are listened for.

If there's interest in adding these, I can propagate them to the TensorRT code as well. I'm not sure how to add tests since that would require using a large multilingual model (we would also need to add some multilingual samples, which might be useful anyway).

@AdolfVonKleist
Copy link

Overall, how reliable is this in general, and compared to say, what happens when you have no special filtering/processing in place? Do you have any objective benchmark? I'm interested in making use of a similar approach locally.

@jsichi
Copy link
Contributor Author

jsichi commented Sep 26, 2024

It's been a while since I worked on this, but it was a noticeable improvement. I don't have any benchmark for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants