These voice can not split correctly #1524

lucasjinreal · 2023-11-02T08:25:39Z

the output time and label

start=0.0s stop=2.4s speaker_SPEAKER_00
start=0.4s stop=1.4s speaker_SPEAKER_01

but the speaker are clearly 2 speaker first and later, how to precisely get the splitter time in the middle?

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-02T08:26:01Z

Thank you for your issue.You might want to check the FAQ if you haven't done so already.

Feel free to close this issue if you found an answer in the FAQ.

If your issue is a feature request, please read this first and update your request accordingly, if needed.

If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:

installation
data preparation
model download
etc.

Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).

Companies relying on pyannote.audio in production may contact me via email regarding:

paid scientific consulting around speaker diarization and speech processing in general;
custom models and tailored features (via the local tech transfer office).

This is an automated reply, generated by FAQtory

hbredin · 2023-11-02T09:40:35Z

Without providing details about the code you tried, it is kind of difficult to tell.
Here are my 2 cents applying the pretrained pyannote/segmentation-3.0 model: it looks like it does manage to do the job...

AUDIO = "asr_res_240-243_1_audio.mp3"

from pyannote.audio import Audio
io = Audio(mono="downmix", sample_rate=16000)
waveform, sample_rate = io(AUDIO)
audio = {"waveform": waveform, "sample_rate": sample_rate}

from pyannote.audio import Inference
inference = Inference("pyannote/segmentation-3.0", window="whole")
prediction = inference(audio)

from matplotlib import pyplot as plt
plt.plot(prediction)
plt.legend(['speaker#1', 'speaker#2', 'speaker#3'])

lucasjinreal · 2023-11-02T11:48:32Z

@hbredin hi, the audio acutially only have 2 people, the first period is person1, and rest is a man voice.

the cliff of speaker3 seems detected the later man voice, but how can i tell, (i actually just need split 2 person), this cliff is exactly what I want?

stale · 2024-05-02T01:45:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hbredin added the cannot_reproduce label Nov 2, 2023

stale bot added the wontfix label May 2, 2024

stale bot closed this as completed Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

These voice can not split correctly #1524

These voice can not split correctly #1524

lucasjinreal commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

hbredin commented Nov 2, 2023

lucasjinreal commented Nov 2, 2023

stale bot commented May 2, 2024

These voice can not split correctly #1524

These voice can not split correctly #1524

Comments

lucasjinreal commented Nov 2, 2023

github-actions bot commented Nov 2, 2023

hbredin commented Nov 2, 2023

lucasjinreal commented Nov 2, 2023

stale bot commented May 2, 2024