Is Speaker Diarization Available or Planned? #961

Abandersen04 · 2024-10-25T17:36:33Z

I’m currently using Buzz for transcribing interviews with multiple speakers. However, I’ve noticed that the transcription doesn’t differentiate between different voices or speakers in the audio. Is speaker diarization (speaker identification) available or on the roadmap as a feature?

Additionally, I noticed the "prompt" feature, but it doesn't seem to affect speaker recognition. Could you clarify its purpose and if it might relate to this?

Thanks in advance for your help!

raivisdejus · 2024-10-25T18:15:21Z

Yes this is one of idea that has been requested previously. It is in the list of things that would be nice to have in the future.
Also seems that https://github.com/MahmoudAshraf97/whisper-diarization works quite well, so it could be implemented in the Buzz at some future day.

Prompt feature of the whisper models is described here https://cookbook.openai.com/examples/whisper_prompting_guide
In my testing it has not showed super meaningful results, but other may get better results. Feel free to share feedback on the results of prompting as it may be useful to others

adijahangir123 · 2024-10-26T07:25:01Z

Isn't possible to utilize pyannote.audio as it is claiming to show good results.

raivisdejus added the enhancement New feature or request label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is Speaker Diarization Available or Planned? #961

Is Speaker Diarization Available or Planned? #961

Abandersen04 commented Oct 25, 2024

raivisdejus commented Oct 25, 2024 •

edited

Loading

adijahangir123 commented Oct 26, 2024

Is Speaker Diarization Available or Planned? #961

Is Speaker Diarization Available or Planned? #961

Comments

Abandersen04 commented Oct 25, 2024

raivisdejus commented Oct 25, 2024 • edited Loading

adijahangir123 commented Oct 26, 2024

raivisdejus commented Oct 25, 2024 •

edited

Loading