-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I simplify Inference in this case? #1527
Comments
Thank you for your issue.
Feel free to close this issue if you found an answer in the FAQ. If your issue is a feature request, please read this first and update your request accordingly, if needed. If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users). Companies relying on
|
I'd suggest you try and report back :-) |
Hi hbredin |
Closing as it reads like the initial question has been answered. |
Hi hbredin
Thanks your open source! I read the article pyannote.audio speaker diarization pipeline at VoxSRC 2023. The entire speaker diarization pipeline contains 3 parts: local end-to-end neural speaker segmentation with 10-seconds windows with 1-second stride,neural speaker embedding of each speaker of each window, and agglomerative hierarchical clustering.
If I have a 2-minute audio clip, and I'm sure there are only 3 speakers in the entire clip, Can I remove the speaker embedding and clustering, only use one global end-to-end neural speaker segmentation using the entire speech as input? Will there be any speaker confusion problem if I do this? Or are there other adverse effects?
Thanks!
The text was updated successfully, but these errors were encountered: