Can I simplify Inference in this case? #1527

zuowanbushiwo · 2023-11-03T08:33:57Z

Hi hbredin

Thanks your open source! I read the article pyannote.audio speaker diarization pipeline at VoxSRC 2023. The entire speaker diarization pipeline contains 3 parts: local end-to-end neural speaker segmentation with 10-seconds windows with 1-second stride，neural speaker embedding of each speaker of each window, and agglomerative hierarchical clustering.

If I have a 2-minute audio clip, and I'm sure there are only 3 speakers in the entire clip, Can I remove the speaker embedding and clustering, only use one global end-to-end neural speaker segmentation using the entire speech as input? Will there be any speaker confusion problem if I do this? Or are there other adverse effects?

Thanks！

github-actions · 2023-11-03T08:34:21Z

Thank you for your issue.
We found the following entries in the FAQ which you may find helpful:

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-11-05T11:18:24Z

If I have a 2-minute audio clip, and I'm sure there are only 3 speakers in the entire clip, Can I remove the speaker embedding and clustering, only use one global end-to-end neural speaker segmentation using the entire speech as input? Will there be any speaker confusion problem if I do this? Or are there other adverse effects?

I'd suggest you try and report back :-)

zuowanbushiwo · 2023-11-06T02:00:01Z

Hi hbredin
The test result of my using this project seems to be OK. This project only uses the segment part, and it is the entire voice input.

hbredin · 2023-11-16T19:46:10Z

Closing as it reads like the initial question has been answered.

zuowanbushiwo changed the title ~~Can I simplify inter in this case?~~ Can I simplify Inference in this case? Nov 6, 2023

hbredin closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I simplify Inference in this case? #1527

Can I simplify Inference in this case? #1527

zuowanbushiwo commented Nov 3, 2023 •

edited

Loading

github-actions bot commented Nov 3, 2023

hbredin commented Nov 5, 2023

zuowanbushiwo commented Nov 6, 2023

hbredin commented Nov 16, 2023 •

edited

Loading

Can I simplify Inference in this case? #1527

Can I simplify Inference in this case? #1527

Comments

zuowanbushiwo commented Nov 3, 2023 • edited Loading

github-actions bot commented Nov 3, 2023

hbredin commented Nov 5, 2023

zuowanbushiwo commented Nov 6, 2023

hbredin commented Nov 16, 2023 • edited Loading

zuowanbushiwo commented Nov 3, 2023 •

edited

Loading

hbredin commented Nov 16, 2023 •

edited

Loading