Is there a way to create .srt with timings #98

veirant · 2022-09-24T12:28:14Z

veirant
Sep 24, 2022

In my personal opinion, 90% of all calls to the transcription tool will come from people doing subtitles - in theory, this can greatly facilitate the work, especially if an articulate fragment is taken for a sentence, which more or less resembles a complete thought. Actually, the question is: is it possible to somehow create subtitles using this projects? Are there forks capable of this? Are you going to add similar functionality? Thank you.

FurkanGozukara · 2022-09-24T12:37:57Z

FurkanGozukara
Sep 24, 2022

It generates timed vtt files automatically and they work perfect on youtube or media player classic.

0 replies

lectair · 2022-09-25T04:31:24Z

lectair
Sep 25, 2022

I've written a small script that converts the output to an SRT file. It is useful for getting subtitles in a universal format for any audio:

from datetime import timedelta
import os
import whisper

def transcribe_audio(path):
    model = whisper.load_model("base") # Change this to your desired model
    print("Whisper model loaded.")
    transcribe = model.transcribe(audio=path)
    segments = transcribe['segments']

    for segment in segments:
        startTime = str(0)+str(timedelta(seconds=int(segment['start'])))+',000'
        endTime = str(0)+str(timedelta(seconds=int(segment['end'])))+',000'
        text = segment['text']
        segmentId = segment['id']+1
        segment = f"{segmentId}\n{startTime} --> {endTime}\n{text[1:] if text[0] is ' ' else text}\n\n"

        srtFilename = os.path.join("SrtFiles", f"VIDEO_FILENAME.srt")
        with open(srtFilename, 'a', encoding='utf-8') as srtFile:
            srtFile.write(segment)

    return srtFilename

Sample SRT File:

1
00:00:00,000 --> 00:00:05,000
Open AI has recently decided to open source.

2
00:00:05,000 --> 00:00:09,000
Their translation and transcription AI whisper.

3
00:00:09,000 --> 00:00:18,000
So now it is under an MIT license and that includes both the code that's here as well as the model weights that were used to train the AI.

4
00:00:18,000 --> 00:00:26,000
So if you want it to go and try and make your own speech transcription AI with that data, you are free to do so.

15 replies

thomasm2film Aug 11, 2023

You can simply set response_format=srt or response_format=vtt to get final subtitle files back

Pareek-Yash Sep 17, 2023

I think response_format only works for the API version, not python package

linomp Nov 17, 2023

bro you are hero thanks!!!

ozamaahmed Jan 28, 2024

hello im an absolute beginner i have a question. does this code ensure that the subtitles are made one sentence to another, like the timestamps for each are from one full stop to another or this is just a coincidence

akbartaimurr Apr 19, 2024

I would like to know that too

edgartaor · 2022-09-25T06:25:30Z

edgartaor
Sep 25, 2022

I also wrote a function to save as srt:

from typing import Iterator, TextIO

def srt_format_timestamp(seconds: float):
    assert seconds >= 0, "non-negative timestamp expected"
    milliseconds = round(seconds * 1000.0)

    hours = milliseconds // 3_600_000
    milliseconds -= hours * 3_600_000

    minutes = milliseconds // 60_000
    milliseconds -= minutes * 60_000

    seconds = milliseconds // 1_000
    milliseconds -= seconds * 1_000

    return (f"{hours}:") + f"{minutes:02d}:{seconds:02d},{milliseconds:03d}"

def write_srt(transcript: Iterator[dict], file: TextIO):
    count = 0
    for segment in transcript:
        count +=1
        print(
            f"{count}\n"
            f"{srt_format_timestamp(segment['start'])} --> {srt_format_timestamp(segment['end'])}\n"
            f"{segment['text'].replace('-->', '->').strip()}\n",
            file=file,
            flush=True,
        )

And use it this way:

result = model.transcribe(filename, verbose=True, language=language)
    # save TXT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt"), "w") as txt:
        print(result["text"], file=txt)

    # save VTT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
        write_vtt(result["segments"], file=vtt)
    
    # save SRT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + f".{language}.srt"), "w") as srt:
        write_srt(result["segments"], file=srt)

7 replies

tapearchives Oct 14, 2022

thanks. I had run earlier but didn't realize the output files were automatic -- but only created upon completion of the entire file.

I had run many aborted passes and was expecting some --output subtitles.srt or --output srt, txt sort of option.

Finally got a 1050 Ti (notebook) small model running -- much faster -- and loving the current setup.

(My first test was CPU and the input was a 90 minute mp3 and towards the end of the file the sync was out by maybe 30 seconds. But now everything with the current version, using GPU and a wav input is spot on -- all in sync. Great tool for converting the thousands of meetings and teaching tapes I have.

Still wishing I could get tapes with a translator -- i.e. English and Spanish -- to work with one pass but I recognize everything is still early so am very glad for the current capabilities.

When I ran the large model it even got "[applause]" which was amazing. All predicated on model data I'm sure. Wish more sample data had "[coughing]" etc.

Assemblyai has sentiment analysis in their AI analysis. I can see a point where emotion in the voice can be brought into a mood or emphasis color to be used in the subs. Would love to see an AI that automatically "storyboards" a talk with hallucinatory visuals and background colors with the text -- i.e. video generation for talks but this is, again, for the future.)

edgartaor Oct 14, 2022

I had run earlier but didn't realize the output files were automatic -- but only created upon completion of the entire file.

Yeah. That happened to me also. I made a some changes on the code on my end to write the srt while decoding.

Would love to see an AI that automatically "storyboards" a talk with hallucinatory visuals and background colors with the text -- i.e. video generation for talks but this is, again, for the future.)

That's an interesting idea. There's a lot going on in the Discussion tab if you want to check it out.

lennartq Dec 12, 2022

You can use datetime.timedelta to format milliseconds

from datetime import timedelta
seconds = 3.5845
timestamp = str(timedelta(milliseconds=round(seconds * 1000)))[:-3]

getData123 Dec 22, 2022

I also wrote a function to save as srt:

from typing import Iterator, TextIO

def srt_format_timestamp(seconds: float):
    assert seconds >= 0, "non-negative timestamp expected"
    milliseconds = round(seconds * 1000.0)

    hours = milliseconds // 3_600_000
    milliseconds -= hours * 3_600_000

    minutes = milliseconds // 60_000
    milliseconds -= minutes * 60_000

    seconds = milliseconds // 1_000
    milliseconds -= seconds * 1_000

    return (f"{hours}:") + f"{minutes:02d}:{seconds:02d},{milliseconds:03d}"

def write_srt(transcript: Iterator[dict], file: TextIO):
    count = 0
    for segment in transcript:
        count +=1
        print(
            f"{count}\n"
            f"{srt_format_timestamp(segment['start'])} --> {srt_format_timestamp(segment['end'])}\n"
            f"{segment['text'].replace('-->', '->').strip()}\n",
            file=file,
            flush=True,
        )

And use it this way:

result = model.transcribe(filename, verbose=True, language=language)
    # save TXT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt"), "w") as txt:
        print(result["text"], file=txt)

    # save VTT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
        write_vtt(result["segments"], file=vtt)
    
    # save SRT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + f".{language}.srt"), "w") as srt:
        write_srt(result["segments"], file=srt)

I am getting write_vtt not defined, what I am doing wrong please

bruhcode234 Jan 3, 2023

I am getting write_vtt not defined, what I am doing wrong please

it's because the code doesn't create write_vtt method (it only create def write_srt), that's why you get that error. so to fix this you just have to delete this snippet code

# save VTT
    with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
        write_vtt(result["segments"], file=vtt)

or you can add the write_vtt method on your own

fcakyon · 2022-09-25T12:19:57Z

fcakyon
Sep 25, 2022

@veirant I have published a Python package that adds srt export support to whisper: https://github.com/fcakyon/pywhisper

0 replies

waltervanheuven · 2022-09-25T14:19:07Z

waltervanheuven
Sep 25, 2022

There are also several Python libraries that can convert .vtt files to .srt, for example pysubs2.

0 replies

ndeville · 2022-09-26T15:46:27Z

ndeville
Sep 26, 2022

It seems there is a function for it: write_srt

See https://github.com/openai/whisper/blob/main/whisper/utils.py

1 reply

yasinuygun Dec 24, 2022

An example usage can be found here: https://gist.github.com/yasinuygun/a5aaa24af9f5531872e4c6c863fd793f

fcakyon · 2022-09-26T15:48:57Z

fcakyon
Sep 26, 2022

@veirant @ndeville now srt is exported by default:

whisper/whisper/transcribe.py

Line 310 in 2037b65

# save SRT

3 replies

ndeville Sep 26, 2022

Thanks!
How do I update?
pip3 install --upgrade https://github.com/openai/whisper.git does not work? 🤔

fcakyon Sep 26, 2022

it should be pip3 install --upgrade git+https://github.com/openai/whisper.git

jongwook Sep 26, 2022
Maintainer

@ndeville Please try pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git if the above doesn't work.

NielsMayer · 2022-10-14T01:01:59Z

NielsMayer
Oct 14, 2022

I also added CSV output with timestamps in milliseconds.... #228 because that's the format I want my transcriptions in for #233

0 replies

bmurphy96 · 2022-10-15T03:14:12Z

bmurphy96
Oct 15, 2022

so two days i did an experiment and generated some transcripts of my podcast using openai/whisper (and the pywhisper wrapper mentioned above by @fcakyon

I uploaded two episodes of my srt files and they didn't work..examining the files closely and the timestamps don't seem to have the proper number of digits. Here is an examaple:

799
1:08:45,800 --> 1:09:13,800
I will talk to you soon.

that is an hour and 8 minutes into the show (at the end). the hosting platform (captivate) is rejecting the files because it shoudl be:

799
01:08:45,800 --> 01:09:13,800
I will talk to you soon.

Hope that makes sense..it's not a 24 hour format.

Is there something wrong with my setup or is this a bug?

4 replies

edgartaor Oct 15, 2022

The problem in on the pywhisper wrapper side.
Use the original openai/whisper repository, days ago got an update that also generate the .srt file in the correct format.

Run this to update whisper: pip3 install --upgrade git+https://github.com/openai/whisper.git

fcakyon Oct 15, 2022

@bmurphy96 this is fixed in pywhisper==1.0.5. Upgrade your pywhisper version with pip install -U pywhisper 👍

bmurphy96 Oct 16, 2022

awesome!! thanks!

tapearchives Oct 16, 2022

awesome!! thanks!

If you’ve transcribed some large batches instead of re-running them with the new code you can open them in the windows-only free and open source app, Subtitle Edit and it will fix the formatting. It has a command line conversion option for batches too.

@edgartaor - to answer your earlier Q. I didn’t think this was the ideal place to respond but briefly I’m involved in setting up a not-for-profit which helps preserve audio and video tape recordings. Focused on Christian ministry classroom, preaching, teaching, meetings, etc. and yes, whisper is literally a Godsend. Great stuff. Thanks all!

ianberman · 2022-11-28T21:54:24Z

ianberman
Nov 28, 2022

I'm wondering if there's an easy way to split up the .srt output into smaller sections? Or is this a function of the 30 second windowing & N_FRAMES etc?

Would I run into issues trying to edit the hyperparameters in audio.py because I would change it away from how the dataset was trained?

Eg. Whisper has output a srt:

00:02:48,000 --> 00:02:59,000
Yeah, but we went against it because we dropped the dollarization. Imagine, we would have had at least, what would have been like a dollar per ticket?

and a human translator gave us smaller sections (which I prefer for this project):

25
00:02:48,265 --> 00:02:49,972
Yeah, but we lost money because of

26
00:02:50,012 --> 00:02:51,700
the whole “dollarization” 

27
00:02:51,740 --> 00:02:55,800
Imagine, we would have had at least

28
00:02:56,879 --> 00:02:59,150
What, like one dollar per ticket?

Any suggestions appreciated :)

1 reply

Baenwort Dec 2, 2022

This would be a good enhancement! I agree the issue could be resolved with a parameter that specifies how many seconds worth of audio text should be made into a single segment (26 above).

Just setting a max text length in a segment wouldn't work but might be a bandaid for now?

tapearchives · 2022-11-29T09:08:03Z

tapearchives
Nov 29, 2022

I believe I've run the same recording multiple times and on one instance seen the transcription broken into short sections, as mentioned is most desired, and another time in longer phrases, which I agree are not ideal. I haven't reported this as a bug because I haven't gone back to verify the issue, but that was my memory. I am hoping the "smaller sections" -- something short enough that will be legible on-screen for useful subtitles -- will be coming eventually.

…

On Mon, Nov 28, 2022 at 3:54 PM Ian Berman ***@***.***> wrote: I'm wondering if there's an easy way to split up the .srt output into smaller sections? Or is this a function of the 30 second windowing & N_FRAMES etc? Would I run into issues trying to edit the hyperparameters in audio.py because I would change it away from how the dataset was trained? Eg. Whisper has output a srt: 00:02:48,000 --> 00:02:59,000 Yeah, but we went against it because we dropped the dollarization. Imagine, we would have had at least, what would have been like a dollar per ticket? and a human translator gave us smaller sections (which I prefer for this project): 25 00:02:48,265 --> 00:02:49,972 Yeah, but we lost money because of 26 00:02:50,012 --> 00:02:51,700 the whole “dollarization” 27 00:02:51,740 --> 00:02:55,800 Imagine, we would have had at least 28 00:02:56,879 --> 00:02:59,150 What, like one dollar per ticket? Any suggestions appreciated :) — Reply to this email directly, view it on GitHub <#98 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3TATOL4VHMOMED7VDFZQUDWKUSZ3ANCNFSM6AAAAAAQUUCPRM> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

boludoz · 2022-11-30T02:23:13Z

boludoz
Nov 30, 2022

I am using google colab to summarize my uni notes, it works great, but there is a problem, when I try to process the following file with python, the memory is not released and the following file fails, i think the same for my RTX 3080. The same is not the case with the command line.

0 replies

Baenwort · 2022-12-06T00:11:58Z

Baenwort
Dec 6, 2022

So if there is long pauses in the dialog the last spoken sentence will stay as the current for several minutes which doesn't look good when added back to the video.

Is there a way to have the srt stop to a blank segment if the dialog stops for more than 5 or 10 seconds (or something user configurable?)

0 replies

ksn-systems · 2022-12-06T00:45:39Z

ksn-systems
Dec 6, 2022

I am using this library as a wrap around and getting good results

https://github.com/jianfch/stable-ts

3 replies

ianberman Dec 6, 2022

Alas I have found the results for translation are not as good as the original whisper, see screenshot comparison attached (whisper left, stable-ts right)

EDIT: my error, see below

ksn-systems Dec 6, 2022

That's odd, stable-ts is only grouping the phrases on the output from Whisper. Whisper still does the translation and there is quite a different.

ianberman Dec 6, 2022

Completely my error, I forgot I switched the model size back away from large when I was trying to get it working. Cheers!

mygithubcomments · 2023-01-20T00:52:10Z

mygithubcomments
Jan 20, 2023

I wrote a script to translate the output of whisper to .vtt format (https://en.wikipedia.org/wiki/WebVTT) which is then loadable and viewable with VLC media player:

import re
import sys
import os

# Input line/entry example: `[00:16.320 --> 00:21.680]  Oh, I got Judith from the bank holding on line two right now, sir.`

# Output line/entry example: 
# ```
# 00:16.320 --> 00:21.680
# Oh, I got Judith from the bank holding on line two right now, sir.
# 
# ```

def convert_subtitles(input_file, output_file):
    with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
        outfile.write("WEBVTT\n\n")
        for line in infile:
            match = re.match(r'\[(.*) --> (.*)\]\s*(.*)', line)
            if(not match is None):  
              start_time, end_time, text = match.groups()
              outfile.write(f"{start_time} --> {end_time}\n{text}\n\n")

if __name__ == "__main__":
    if(len(sys.argv) <= 2):
      print('Usage: {} InputFileName1.txt OutputSubtitleFileName1.vtt'.format(os.path.basename(__file__)))
      exit(1)

    input_file = sys.argv[1]
    output_file = sys.argv[2]
    convert_subtitles(input_file, output_file)

The script is not thoroughly tested so it may have bugs (I ran it once or twice ...) ...

Usage:

;; generate subtitle
$ python -m whisper --model medium.en audio1.aac > subtitle1.txt
;; convert subtitle
$ convert_subs.py subtitle1.txt audio1.vtt

0 replies

silkogelman · 2023-03-04T08:16:23Z

silkogelman
Mar 4, 2023

With the current Whisper release the easiest way is probably to use get_writer to generate SRT, WEBVTT, JSON, TSV and TXT files.

import whisper
from whisper.utils import get_writer

filename = "example-file.mp4"
input_directory = "."
input_file = f"{input_directory}/{filename}"

model = whisper.load_model("large") # or whatever model you prefer
result = model.transcribe(input_file)
output_directory = "."

# Save as a TXT file
txt_writer = get_writer("txt", output_directory)
txt_writer(result, input_file)

# Save as an SRT file
srt_writer = get_writer("srt", output_directory)
srt_writer(result, input_file)

# Save as an VTT file
vtt_writer = get_writer("vtt", output_directory)
vtt_writer(result, input_file)

# Save as a TSV file
tsv_writer = get_writer("tsv", output_directory)
tsv_writer(result, input_file)

# Save as a JSON file
json_writer = get_writer("json", output_directory)
json_writer(result, input_file)

7 replies

4drawing95 Apr 12, 2023

that is not worked

moviePathList = os.listdir(dataPath)
mp3List = []
for x in moviePathList:
    if x.endswith(".mp3"):
        mp3List.append(x)

model = whisper.load_model("small") 
for x in mp3List:
    result = model.transcribe(dataPath + x)
    srt_writer = get_writer("srt", dataPath)
    srt_writer(result,x)

When written as above, the

ResultWriter.call() missing 1 required positional argument: 'options'
File "A:\AI\translate\WhisperAI.py", line 63, in
srt_writer(result,x)
TypeError: ResultWriter.call() missing 1 required positional argument: 'options'

I get this error

ottoluna Jul 16, 2023

Dear,
how you solved this?. This is my first day using whisper and I got this error, I tried to put an empty dictionary {} but got this error message:
File "C:\Users\ottol\anaconda3\envs\whisper\Lib\site-packages\whisper\utils.py", line 104, in iterate_result
raw_max_line_width: Optional[int] = options["max_line_width"]

AiGig Jul 16, 2023

The reason this is throwing an exception is because there's an additional parameter needed. A dictionary object for options. When this was changed, I'm not sure- but if you are using the latest build, try adding the following:

BEGIN CODE

word_options = {
"highlight_words": True,
"max_line_count": 50,
"max_line_width": 3
}

srt_writer = get_writer("srt", output_directory)
srt_writer(result, input_file, word_options)

END CODE

It's been working fine for me (including all the other formats (vtt, json, text etc.)

I'm not sure what all the available options are, because I need to do a deeper dive. I came across it from this post:
https://stackoverflow.com/questions/76608484/whisper-based-speech-to-subtitles-python-script-word-timestamps-issue

ottoluna Jul 16, 2023

Thanks! , it worked.

Pimool Aug 25, 2023

HI, how can I pass max_line_count and max_line_width parameters using openai library.
My code is like below.

import openai
transcript = openai.Audio.transcribe(model = 'whisper-1', file = 'audio.mp3', response_format = 'srt', language = 'ko', max_line_count = 10, max_line_width = 5)
with open('transcript.srt'), 'w', encoding = 'utf-8') as f:
    f.write(transcript)

However, when I looked at the output transcript, nothing is different with the original transcript(without max_line_count, max_line_width).

As @AiGig said that it is working fine, I think it works with openai-whisper, but I don't know how to pass the params in this openai library.

(When using openai-whisper, it was too slow to see the output).

Can anyone help?

I pass test="It's a test" in the openai.Audio.transcribe, but it didn't occur any error.

I guess even though the input parameter is not defined, it doesn't make any error.

So, I don't even know whether I put the parameter correctly or not

fadygit · 2023-03-16T09:13:45Z

fadygit
Mar 16, 2023

I found an app of whisper on website replicate which you can upload the audio file and choose whatever the type of the output transcript (plain text, srt, vtt) ..........> check it : https://replicate.com/openai/whisper

2 replies

mcsmark Apr 7, 2023

it works amazingly well. but the timestamps do not look correct. and it takes only files up to 25MB.
it works better than a transcription service i used before. that was a waste of money,.

mcsmark Apr 8, 2023

as i just noticed it is limited. you need to offer credit card too.

4drawing95 · 2023-04-12T08:04:50Z

4drawing95
Apr 12, 2023

Is there any way to make Whisper break text based on . , ? ! I'm a non-English speaker and I'm trying to use the translation API to create subtitles afterwards, but one sentence in English is so long that I want to break it in the middle, but if I do this in post-processing, the subtitles are out of sync or the timetable is out of sync. Essentially, it would be ideal to have Whisper generate the text with that criterion, but is there any way to fine-tune it like that?

3 replies

edgartaor Apr 13, 2023

Check this fork: https://github.com/linto-ai/whisper-timestamped or this one: https://github.com/jianfch/stable-ts
Those repos try to timestamp each word. If it works, is not going to solve your use case out of the box but at least now you know the time stamp of the beginning and end of each phrase. And then, after translation, you can create a script to match the translated phrase with the corresponding timestamp.

sm-main Apr 18, 2023

Is it possible to get variable srt grouping like 30 seconds in the stable-ts repo?

4drawing95 Apr 18, 2023

Check this fork: https://github.com/linto-ai/whisper-timestamped or this one: https://github.com/jianfch/stable-ts Those repos try to timestamp each word. If it works, is not going to solve your use case out of the box but at least now you know the time stamp of the beginning and end of each phrase. And then, after translation, you can create a script to match the translated phrase with the corresponding timestamp.

Thank you so much I was able to solve this problem thanks to you.

eatcosmos · 2023-06-04T19:30:43Z

eatcosmos
Jun 4, 2023

https://wandb.ai/wandb_fc/gentle-intros/reports/How-to-Transcribe-Your-Audio-to-Text-for-Free-with-SRTs-VTTs---VmlldzozNDczNTI0#saving-a-whisper-transcription-as-an-srt/vtt-file

0 replies

dgoryeo · 2023-10-16T15:21:38Z

dgoryeo
Oct 16, 2023

Hi, I came across this thread. Is there a function to post process a json output into an SRT format? Thanks.

0 replies

tapearchives · 2023-10-16T18:34:08Z

tapearchives
Oct 16, 2023

Whisper creates .srt natively. But if you need to convert json to srt look at OpenSubtitles as I suspect it probably does. General Google search might reveal something also.

…

On Mon, Oct 16, 2023 at 10:21 AM dgoryeo ***@***.***> wrote: Hi, I came across this thread. Is there a function to post process a json output into an SRT format? Thanks. — Reply to this email directly, view it on GitHub <#98 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3TATOM3YHXKRTC3BKUGXI3X7VGJDAVCNFSM6AAAAAAQUUCPROVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TEOJUGAYDM> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

comicfans · 2023-10-19T12:53:27Z

comicfans
Oct 19, 2023

kdenlive can use whisper to convert audio to subtitles ,
code at ~~https://github.com/KDE/kdenlive/blob/a89e93b672bf51f918945e1791bb8f24652bf871/data/scripts/whispertotext.py~~

https://github.com/KDE/kdenlive/blob/master/data/scripts/whispertosrt.py

you can use it out of box

2 replies

dgoryeo Apr 12, 2024

Hi @comicfans , just came across your post. Looking at https://github.com/KDE/kdenlive/blob/master/data/scripts/whispertosrt.py it uses seamless-m4t-v2-large for translation. Do I need to import seamless or will it just work?

comicfans Apr 12, 2024

IIRC, I haven't imported seamless, it just work

kareva88 · 2024-04-12T08:30:41Z

kareva88
Apr 12, 2024

%%time
!whisper "/content/drive/MyDrive/Colab Notebooks/test.mp4" --model large --language Russian --output_format srt

0 replies

akbartaimurr · 2024-04-19T11:55:49Z

akbartaimurr
Apr 19, 2024

I would like to know If whisper can make srt's ike this into one word per each :

25
00:02:48,265 --> 00:02:49,972
Yeah, but we lost money because of

26
00:02:50,012 --> 00:02:51,700
the whole “dollarization” 

27
00:02:51,740 --> 00:02:55,800
Imagine, we would have had at least

28
00:02:56,879 --> 00:02:59,150
What, like one dollar per ticket?

I'm currently using AssemblyAI to transcribe and make the srt file but it doesn't do a good job in giving one word per each SRT's and I thought maybe I could switch to whisper if it had some workaround into doing this.
Can anyone let me know?

0 replies

Chaosbreed1 · 2024-10-01T19:45:52Z

Chaosbreed1
Oct 1, 2024

For Jyupiter notebook, ffmeg must of course be installed.
Output: txt, srt and vtt files.
Audio file in the same folder.

import os
import whisper

def srt_format_timestamp(seconds: float):
    assert seconds >= 0, "non-negative timestamp expected"
    milliseconds = round(seconds * 1000.0)

    hours = milliseconds // 3_600_000
    milliseconds -= hours * 3_600_000

    minutes = milliseconds // 60_000
    milliseconds -= minutes * 60_000

    seconds = milliseconds // 1_000
    milliseconds -= seconds * 1_000

    return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"


def write_srt(transcript, file):
    count = 0
    for segment in transcript:
        count += 1
        print(
            f"{count}\n"
            f"{srt_format_timestamp(segment['start'])} --> {srt_format_timestamp(segment['end'])}\n"
            f"{segment['text'].replace('-->', '->').strip()}\n",
            file=file,
            flush=True,
        )


def write_vtt(transcript, file):
    file.write("WEBVTT\n\n")
    for segment in transcript:
        start = format_timestamp_vtt(segment['start'])
        end = format_timestamp_vtt(segment['end'])
        text = segment["text"].strip()
        file.write(f"{start} --> {end}\n{text}\n\n")

def format_timestamp_vtt(seconds):
    milliseconds = round(seconds * 1000.0)
    hours = milliseconds // 3_600_000
    milliseconds -= hours * 3_600_000
    minutes = milliseconds // 60_000
    milliseconds -= minutes * 60_000
    seconds = milliseconds // 1_000
    milliseconds -= seconds * 1_000
    return f"{hours:02d}:{minutes:02d}:{seconds:02d}.{milliseconds:03d}"

model = whisper.load_model("medium")
filename = "Audio.opus" #File in the same folder
language = "de"
output_dir = "output"

os.makedirs(output_dir, exist_ok=True)

result = model.transcribe(filename, verbose=True, language=language)

with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt"), "w") as txt:
    print(result["text"], file=txt)

with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
    write_vtt(result["segments"], file=vtt)

with open(os.path.join(output_dir, os.path.splitext(filename)[0] + f".{language}.srt"), "w") as srt:
    write_srt(result["segments"], file=srt)
with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt"), "w") as txt:
    print(result["text"], file=txt)

with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
    write_vtt(result["segments"], file=vtt)

with open(os.path.join(output_dir, os.path.splitext(filename)[0] + f".{language}.srt"), "w") as srt:
    write_srt(result["segments"], file=srt)

0 replies

efeakaroz13 · 2025-01-12T11:13:12Z

efeakaroz13
Jan 12, 2025

Hey, I did it this way
python3 -m whisper b.mov --model small --language English

0 replies

EdwardsLe202 · 2025-02-12T17:28:07Z

EdwardsLe202
Feb 12, 2025

Is there any solutions for coding in NodeJS? I want the subtitles to display a full sentence. Thanks!!!

0 replies

Is there a way to create .srt with timings #98

Replies: 27 comments · 48 replies

jongwook Sep 26, 2022 Maintainer

BEGIN CODE

END CODE

Replies: 27 comments 48 replies

jongwook Sep 26, 2022
Maintainer