Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Silero VAD support #889

Open
3manifold opened this issue Sep 26, 2024 · 0 comments · May be fixed by #888
Open

[Feature] Silero VAD support #889

3manifold opened this issue Sep 26, 2024 · 0 comments · May be fixed by #888

Comments

@3manifold
Copy link

3manifold commented Sep 26, 2024

VAD model plays a crucial role in the WhisperX pipeline and can significantly affect speech recognition performance and inference time. Thus, it is important to extend the application to accept alternative VAD methods. These methods do not necessarily have to emerge from pyannote-audio toolkit (as in the case of the default VAD model). Silero VAD is an ideal candidate for an alternative VAD option. It has excellent results on speech detection tasks running only on CPUs. In addition, it is considered a high-priority TODO item in WhisperX repository.

This feature includes:

  • Implementation of Silero VAD as an alternative VAD option.
  • Extension of WhisperX to accept VAD alternatives that do not have to necessarily emerge from pyannote-audio toolkit.
  • Fix in whisperx\__init__.py imports.

Implementation, description of tests as well as future work can be found in pull request #888 .

@3manifold 3manifold linked a pull request Sep 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant