-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
outputs of separation module is clipping #1729
Comments
Hi @faroit, thank you for your interest in PixIT! I suspect the issue is that the current version is trained only on the AMI meeting dataset. On the AMI test set this hasn’t been an issue. Finetuning on domain-specific audio would likely improve the separation performance. |
@joonaskalda thanks for your reply. I am not sure if fine-tuning would really be able to fix any of this. Was the model trained on zero-mean, unit variance data? |
Thanks for investigating. I checked and the separated sources are (massively) scaled up for AMI data too. I never noticed because I’ve peak-normalized them before use. The scale-invariant loss is indeed the likely culprit. The training data was not normalized to zero mean and unit variance. |
@joonaskalda thanks for the update. Maybe you can add a normalization to the pipeline so that users that aren't familiar with SI-SDR trained models aren't surprised |
I came here because I was surprised ;-) |
May I ask if the DC bias is also to be expected? I see it happening even in areas where there is no speech overlap. I am actually thinking about substituting the audio back from the original in the non-overlapping areas as the bias can cause severe artifacts even after normalizing. |
Tested versions
System information
macOS, m1
Issue description
Hi @hbredin, @joonaskalda thanks for this great release!
I tried some examples on the new pixit pipeline and I find outputs of the separation module seem to produce a very high level of clipping. Is this to be expected from the way it was trained with scale-invariant losses?
Input was a downsampled 16khz mono wav file from the youtube excerpt linked below.
Minimal reproduction example (MRE)
https://www.youtube.com/watch?v=CGUpPyA48jE&t=182s
The text was updated successfully, but these errors were encountered: