Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Silero VAD #273

Open
tjainsuki opened this issue Feb 13, 2025 · 5 comments
Open

Support for Silero VAD #273

tjainsuki opened this issue Feb 13, 2025 · 5 comments
Labels
feature New feature or request question Further information is requested

Comments

@tjainsuki
Copy link

Hi Developers,

Thank you for your amazing work on this project!

I was wondering if there’s a way to use Silero VAD. I noticed that PyAnnote VAD is supported, but Silero VAD isn’t. Have you tried integrating Silero VAD, and if so, how does its accuracy, or latency compare?

I also tried adding Silero VAD with custom parameters, but unfortunately, I couldn’t get it to work. Any guidance or suggestions would be greatly appreciated!

@juanmc2005 juanmc2005 added feature New feature or request question Further information is requested labels Feb 13, 2025
@juanmc2005
Copy link
Owner

Hi @tjainsuki! How were you thinking of integrating Silero? As an alternative VAD pipeline?
I haven't tried integrating it but if you have an idea I would be glad to work on a PR with you to get it to work

@csetanmayjain
Copy link

csetanmayjain commented Feb 13, 2025

hi @juanmc2005

suggesting to have Silero VAD as an alternative to Pyannote VAD.

I’ve written a basic script to implement this functionality, but I’m not very familiar with the internal workings of the diart library. My script has a bug in processing & returning the correct parameters. Would you be able to help me fix this issue?

Attached is the py file with txt as an extension

Thanks!

sd_silero.txt

@juanmc2005
Copy link
Owner

The thing is Silero only makes sense in a VoiceActivityDetection pipeline. It would be a segmentation model that we won't be able to use in replacement of any SegmentationModel.
In this case I think we could probably introduce a new type of model, e.g. VADModel. That way one of the VADModel implementations could leverage pyannote and the other could rely on silero. However, you wouldn't be able to use a VADModel in a SpeakerDiarization pipeline (as expected).

We should also think about how to design the interface of VADModel and how to change the inner workings of VoiceActivityDetection so that both types of model are compatible, because they work in very different ways.

@csetanmayjain
Copy link

That makes sense. I'll try to modify the VAD to support Silero when I have the bandwidth.

I'll keep you updated!

@sprath9
Copy link

sprath9 commented Feb 21, 2025

@juanmc2005 Could we please include this feature in the new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants