-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio buffer is not finite everywhere #24
Comments
That's strange, never seen that before and not sure how to reproduce it without the data. I did some googling and at least there was some suggestion that Librosa might throw that kind of error for NaN inputs. Could you check if the y in run_SylNet (the read waveform) is all finite values, i.e., no NaNs or Infs? Another thing to try, which should not affect the results, is to add some very small white noise floor to the y (i.e., just adding a vector of very small random numbers with zero mean)? This might help in the case the signal has exact zeros at signal onset/offset, which sometimes throws feature extractors off. |
The VTC (or rather pyannote-audio) also uses librosa so I'm not sure why it only occurs when running ALICE. But indeed, there are some np.inf in the audio files that raise the exception. I'll try adding some noise and see how it goes! |
Linked this issue also on SylNet repo side. |
Another thing that is also perhaps possible: VTC is used as a front-end for SylNet (and other feature extraction) to split the input long-form data into "utterances". So, if VTC produces a segment that is not sufficiently long for Librosa feature extractor, that might cause an error. If this is the case, I could add some minimum duration threshold for the data splitting stage. |
I sometimes run into issues such as the following. I don't really understand why as upon listening to the file everything looks normal and the VTC & VCM work perfectly on it.
The text was updated successfully, but these errors were encountered: