Voice Activity Detection (VAD) is a critical component in many speech processing applications. It involves distinguishing between segments of audio that contain speech and those that contain non-speech (silence, background noise, etc.). This project aims to evaluate and compare the performance of several state-of-the-art VAD models on a diverse set of languages.
Compares the performance of various VAD models across different languages. The models evaluated include:
- pyannote.audio
- SpeechBrain
- FunASR
- Silero
It includes the full implementation of their inference.