How to break sentence like "hello, can you hear me?" to 2 segments by actual timestamp? #209

neoyxm · 2023-05-08T07:49:08Z

neoyxm
May 8, 2023

Hi , I have an audio like below: between hello and can you hear me, there is about 5s delay.

What I expect is 'hello' and 'can you hear me' are put to 2 segments:
{ "start": "00:01.680", "end": "00:02.448", "txt": "Hello" }
{ "start": "00:06.256", "end": "00:07.360", "txt": "can you hear me" }

But they are put in one segment:
{ "start": "00:00:06.256", "end": "00:00:07.360", "txt": "Hello, can you hear me?" }

Hello is merged to can you hear me segment, and segment.start (00:00:06.256) is not Hello's start time (00:01.680).

My transcribe option is : language=en, word_timestamps=True, vad_filter=True and others are default.

From the debug info from transcribing, the Silero VAD has splitted these two segments, but the result still puts them together.
[INFO]-(transcribe:240) Processing audio with duration 00:09.280
[INFO]-(transcribe:249) VAD filter removed 00:04.944 of audio
[DEBUG]-(transcribe:255) VAD filter kept the following audio segments: [00:01.136 -> 00:02.448], [00:06.256 -> 00:09.280]

Is there any options which can help to put them to 2 segments by actual timestamp?

neoyxm · 2023-05-08T10:31:11Z

neoyxm
May 8, 2023
Author

After set word_timestamps = False, the segment.start is ok

0 replies

neoyxm · 2023-05-09T01:30:29Z

neoyxm
May 9, 2023
Author

reopen it.

0 replies

palladium123 · 2023-05-12T10:16:39Z

palladium123
May 12, 2023

If I understand correctly, you want to transcribe each speech chunks separately?

If that is the case, your best bet is to split the audio separately by calling get_speech_timestamps on the audio file. This should return a List of start and end timestamps for each speech chunks, removing non-speech parts. You then want to call collect_chunks on each chunks returned from get_speech_timestamps. These two methods are in the vad.py module. Finally, call transcribe on each chunks with vad_filter=False (since you already used Silero VAD in the previous 2 steps).

Haven't tested the above approach - it's just my take looking at the transcribe method.

4 replies

neoyxm May 15, 2023
Author

Thanks, I will try that.

zacharynapier May 24, 2023

Did you ever find a solution?

neoyxm May 25, 2023
Author

Did you ever find a solution?

not yet

toanhn Sep 12, 2024

I tried this approach, but sometimes I get more incorrect words or sentences in the resulting transcript than without using get_speech_timestamps.

toanhuynhnguyen · 2024-09-14T13:34:17Z

toanhuynhnguyen
Sep 14, 2024

You can use Batched Faster-Whisper since it can transcribe an audio file to a transcript with word timestamps more accurate than sequence Faster-Whipser. Then you can separate a sentence into multiple sentences based on paused durations between words, such as 2 seconds or any N seconds you want.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to break sentence like "hello, can you hear me?" to 2 segments by actual timestamp? #209

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to break sentence like "hello, can you hear me?" to 2 segments by actual timestamp? #209

neoyxm May 8, 2023

Replies: 4 comments · 4 replies

neoyxm May 8, 2023 Author

neoyxm May 9, 2023 Author

palladium123 May 12, 2023

neoyxm May 15, 2023 Author

zacharynapier May 24, 2023

neoyxm May 25, 2023 Author

toanhn Sep 12, 2024

toanhuynhnguyen Sep 14, 2024

neoyxm
May 8, 2023

Replies: 4 comments 4 replies

neoyxm
May 8, 2023
Author

neoyxm
May 9, 2023
Author

palladium123
May 12, 2023

neoyxm May 15, 2023
Author

neoyxm May 25, 2023
Author

toanhuynhnguyen
Sep 14, 2024