-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to run diariazation pipeline on multiple file segments ? #265
Comments
Hi @ywangwxd! What you're trying to do here is more complex than setting a timestamp shift. If you run a different audio source and pipeline for each chunk, diart will assume that each file part is a different file, and it will attempt to give you all the results it can. It would seem to me that you can frame the problem as a "conversion" between streams of audio chunks. In other words, you have a stream of non-overlapping 5s chunks and you want to feed that to a diart pipeline. What I would do here is to implement your own audio source. In this custom source, you would iterate over your file parts. Each part should be split into blocks of size |
Thanks for your suggestion. I have made it work by a workaround. I found the problem was padding. To be more specific, the last chunk of each file segment will be padded to make a full length of duration (5s in my case). By doing so, it will have an effect of adding an artifact blank audio piece (less than 5s) in the end of each segment. So the timeshift for the next segment is not the actual duration of previous segment, but
By doing this modification, I can make the diariazation and asr alignment correct for each segment. But to remove the added artifact noise audio, I need to do another "subtraction". But this time it is done on the final transcription results. Overall speaking, I have done something like this.
The use of |
I am trying to run the diariazation pipeline on multiple file segments which are continuous parts of a long audio file.
Here is what I am doing. To put it simple and ideally, I just need to set the
timestamp_shift
correctly for each segment.But the result is incorrect. Specifically, the result of each segment after the first one is always one duration (5 seconds in my config) forward. Here is an example diariazation result to explain the probem. To check if the timestamp is correct, I have done a dummy test. I used the same file segment in two consecutive running of the pipeline. The duration of this file segment is 300 seconds. You can see that in the first segment, the speech starts at 7.820 seconds which is correct. In the second segment, it is supposed to start at roughly 307.820 seconds, but the result is starting at 312.820 seconds. The difference is exactly 5 seconds. I have gone into the code by debugging. It looks like that the
last_end_time
of audio_buffer in the end of each segment is always 5 seconds forward. But I do not know how to fix it .The text was updated successfully, but these errors were encountered: