Does any method in whisper return audio file duration? #179

AnkS4 · 2022-09-29T07:37:05Z

AnkS4
Sep 29, 2022

Hi,

I am trying to check if I could get audio file duration from any of whispers' methods.

This seems closer to audio duration:
transcription["segments"][0]["end"]-transcription["segments"][0]["start"]

What would be recommended way to do it?

seemantsingh-code · 2022-11-06T07:38:10Z

seemantsingh-code
Nov 6, 2022

Small correction if we are talking about complete audio duration
transcription["segments"][-1]["end"]-transcription["segments"][0]["start"]

0 replies

glangford · 2022-11-06T12:32:05Z

glangford
Nov 6, 2022

In practice, the whisper segments do not seem to exactly match the actual audio duration. A different option would be to use ffmpeg directly for this purpose. For example

How to extract duration time from ffmpeg output?

0 replies

jongwook · 2022-11-11T03:15:01Z

jongwook
Nov 11, 2022
Maintainer

The whisper.audio.load_audio() function returns mono audio waveform resampled to 16kHz, so you can divide its length by 16000 to get the audio length in seconds.

whisper/whisper/audio.py

Lines 22 to 49 in f680570

    
           def load_audio(file: str, sr: int = SAMPLE_RATE): 
        
               """ 
        
               Open an audio file and read as mono waveform, resampling as necessary 
        
               Parameters 
        
               ---------- 
        
               file: str 
        
                   The audio file to open 
        
               sr: int 
        
                   The sample rate to resample the audio if necessary 
        
               Returns 
        
               ------- 
        
               A NumPy array containing the audio waveform, in float32 dtype. 
        
               """ 
        
               try: 
        
                   # This launches a subprocess to decode audio while down-mixing and resampling as necessary. 
        
                   # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed. 
        
                   out, _ = ( 
        
                       ffmpeg.input(file, threads=0) 
        
                       .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr) 
        
                       .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True) 
        
                   ) 
        
               except ffmpeg.Error as e: 
        
                   raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e 
        
               return np.frombuffer(out, np.int16).flatten().astype(np.float32) / 32768.0

But this will of course slower than more direct method like what @glangford suggested.

0 replies

wqw547243068 · 2022-11-17T08:48:38Z

wqw547243068
Nov 17, 2022

In fact, whisper has produced the right duration details in result['segment'], except last segment

Whisper show the following details:

# sample: 13 seconds
[00:00:00,000 --> 00:00:07,000] 项目的物业 富华行物业 管理长安俱乐部
[00:00:07,000 --> 00:00:37,000] 长安俱乐部是十大俱乐部之首 物业费是24块钱一瓶
# sample: 10 seconds
[00:00.000 --> 00:02.760] 项目配有2000坪的会所
[00:02.760 --> 00:06.160] 里面游泳、健身、疗养与一体
[00:06.160 --> 00:34.160] 还有,还设有会客厅、思想、宴厅等
# sample: 21 seconds
[00:00.000 --> 00:04.100] 项目200米范围内配套有王府井三圈
[00:04.100 --> 00:05.900] 在北京独一无二
[00:05.900 --> 00:08.600] 王府中环APM
[00:08.600 --> 00:10.300] 东营IM88
[00:10.300 --> 00:11.800] 东方新天地
[00:11.800 --> 00:14.000] 王府井百货大楼
[00:14.000 --> 00:15.600] 金宝会
[00:15.600 --> 00:18.400] 哈姆雷斯玩具店
[00:18.400 --> 00:20.800] 汇聚一线高端品牌
[00:20.800 --> 00:30.800] 可足以满足日常的购物休闲等生活

As you see, all the end time of the last segment are wrong... How does the VAD function work?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does any method in whisper return audio file duration? #179

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Does any method in whisper return audio file duration? #179

AnkS4 Sep 29, 2022

Replies: 4 comments

seemantsingh-code Nov 6, 2022

glangford Nov 6, 2022

jongwook Nov 11, 2022 Maintainer

wqw547243068 Nov 17, 2022

AnkS4
Sep 29, 2022

seemantsingh-code
Nov 6, 2022

glangford
Nov 6, 2022

jongwook
Nov 11, 2022
Maintainer

wqw547243068
Nov 17, 2022