Replies: 4 comments 4 replies
-
After set word_timestamps = False, the segment.start is ok |
Beta Was this translation helpful? Give feedback.
-
reopen it. |
Beta Was this translation helpful? Give feedback.
-
If I understand correctly, you want to transcribe each speech chunks separately? If that is the case, your best bet is to split the audio separately by calling Haven't tested the above approach - it's just my take looking at the |
Beta Was this translation helpful? Give feedback.
-
You can use Batched Faster-Whisper since it can transcribe an audio file to a transcript with word timestamps more accurate than sequence Faster-Whipser. Then you can separate a sentence into multiple sentences based on paused durations between words, such as 2 seconds or any N seconds you want. |
Beta Was this translation helpful? Give feedback.
-
Hi , I have an audio like below: between hello and can you hear me, there is about 5s delay.
What I expect is 'hello' and 'can you hear me' are put to 2 segments:
{ "start": "00:01.680", "end": "00:02.448", "txt": "Hello" }
{ "start": "00:06.256", "end": "00:07.360", "txt": "can you hear me" }
But they are put in one segment:
{ "start": "00:00:06.256", "end": "00:00:07.360", "txt": "Hello, can you hear me?" }
Hello is merged to can you hear me segment, and segment.start (00:00:06.256) is not Hello's start time (00:01.680).
My transcribe option is : language=en, word_timestamps=True, vad_filter=True and others are default.
From the debug info from transcribing, the Silero VAD has splitted these two segments, but the result still puts them together.
[INFO]-(transcribe:240) Processing audio with duration 00:09.280
[INFO]-(transcribe:249) VAD filter removed 00:04.944 of audio
[DEBUG]-(transcribe:255) VAD filter kept the following audio segments: [00:01.136 -> 00:02.448], [00:06.256 -> 00:09.280]
Is there any options which can help to put them to 2 segments by actual timestamp?
Beta Was this translation helpful? Give feedback.
All reactions