outputs of separation module is clipping #1729

faroit · 2024-06-19T14:17:13Z

Tested versions

3.3

System information

macOS, m1

Issue description

Hi @hbredin, @joonaskalda thanks for this great release!

I tried some examples on the new pixit pipeline and I find outputs of the separation module seem to produce a very high level of clipping. Is this to be expected from the way it was trained with scale-invariant losses?

Input was a downsampled 16khz mono wav file from the youtube excerpt linked below.

Minimal reproduction example (MRE)

https://www.youtube.com/watch?v=CGUpPyA48jE&t=182s

# instantiate the pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speech-separation-ami-1.0",
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# run the pipeline on an audio file
diarization, sources = pipeline("audio.wav")

# dump the diarization output to disk using RTTM format
with open("audio.rttm", "w") as rttm:
    diarization.write_rttm(rttm)

# dump sources to disk as SPEAKER_XX.wav files
import scipy.io.wavfile
for s, speaker in enumerate(diarization.labels()):
    scipy.io.wavfile.write(f'{speaker}.wav', 16000, sources.data[:,s])

joonaskalda · 2024-06-20T04:27:28Z

Hi @faroit, thank you for your interest in PixIT! I suspect the issue is that the current version is trained only on the AMI meeting dataset. On the AMI test set this hasn’t been an issue. Finetuning on domain-specific audio would likely improve the separation performance.

faroit · 2024-06-20T09:20:51Z

@joonaskalda thanks for your reply. I am not sure if fine-tuning would really be able to fix any of this.
I digged a bit deeper and saw that the maximum output after separation is about 81.0 in that example. Also interesting is that it also drifts in terms of bias. Here is the peak-normalized output of speaker 1

Was the model trained on zero-mean, unit variance data?

joonaskalda · 2024-06-21T04:25:57Z

Thanks for investigating. I checked and the separated sources are (massively) scaled up for AMI data too. I never noticed because I’ve peak-normalized them before use. The scale-invariant loss is indeed the likely culprit.

The training data was not normalized to zero mean and unit variance.

faroit · 2024-06-21T07:01:29Z

@joonaskalda thanks for the update. Maybe you can add a normalization to the pipeline so that users that aren't familiar with SI-SDR trained models aren't surprised

gaspardpetit · 2024-07-11T02:51:08Z

@joonaskalda thanks for the update. Maybe you can add a normalization to the pipeline so that users that aren't familiar with SI-SDR trained models aren't surprised

I came here because I was surprised ;-)

gaspardpetit · 2024-07-13T22:52:10Z

May I ask if the DC bias is also to be expected? I see it happening even in areas where there is no speech overlap. I am actually thinking about substituting the audio back from the original in the non-overlapping areas as the bias can cause severe artifacts even after normalizing.

joonaskalda linked a pull request Jun 21, 2024 that will close this issue

fix: peak-normalize separated sources #1730

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

outputs of separation module is clipping #1729

outputs of separation module is clipping #1729

faroit commented Jun 19, 2024 •

edited

Loading

joonaskalda commented Jun 20, 2024

faroit commented Jun 20, 2024 •

edited

Loading

joonaskalda commented Jun 21, 2024

faroit commented Jun 21, 2024

gaspardpetit commented Jul 11, 2024

gaspardpetit commented Jul 13, 2024

outputs of separation module is clipping #1729

outputs of separation module is clipping #1729

Comments

faroit commented Jun 19, 2024 • edited Loading

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

joonaskalda commented Jun 20, 2024

faroit commented Jun 20, 2024 • edited Loading

joonaskalda commented Jun 21, 2024

faroit commented Jun 21, 2024

gaspardpetit commented Jul 11, 2024

gaspardpetit commented Jul 13, 2024

faroit commented Jun 19, 2024 •

edited

Loading

faroit commented Jun 20, 2024 •

edited

Loading