Replies: 25 comments 48 replies
-
It generates timed vtt files automatically and they work perfect on youtube or media player classic. |
Beta Was this translation helpful? Give feedback.
-
I've written a small script that converts the output to an SRT file. It is useful for getting subtitles in a universal format for any audio:
Sample SRT File:
|
Beta Was this translation helpful? Give feedback.
-
I also wrote a function to save as srt: from typing import Iterator, TextIO
def srt_format_timestamp(seconds: float):
assert seconds >= 0, "non-negative timestamp expected"
milliseconds = round(seconds * 1000.0)
hours = milliseconds // 3_600_000
milliseconds -= hours * 3_600_000
minutes = milliseconds // 60_000
milliseconds -= minutes * 60_000
seconds = milliseconds // 1_000
milliseconds -= seconds * 1_000
return (f"{hours}:") + f"{minutes:02d}:{seconds:02d},{milliseconds:03d}"
def write_srt(transcript: Iterator[dict], file: TextIO):
count = 0
for segment in transcript:
count +=1
print(
f"{count}\n"
f"{srt_format_timestamp(segment['start'])} --> {srt_format_timestamp(segment['end'])}\n"
f"{segment['text'].replace('-->', '->').strip()}\n",
file=file,
flush=True,
) And use it this way: result = model.transcribe(filename, verbose=True, language=language)
# save TXT
with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".txt"), "w") as txt:
print(result["text"], file=txt)
# save VTT
with open(os.path.join(output_dir, os.path.splitext(filename)[0] + ".vtt"), "w") as vtt:
write_vtt(result["segments"], file=vtt)
# save SRT
with open(os.path.join(output_dir, os.path.splitext(filename)[0] + f".{language}.srt"), "w") as srt:
write_srt(result["segments"], file=srt) |
Beta Was this translation helpful? Give feedback.
-
@veirant I have published a Python package that adds srt export support to whisper: https://github.com/fcakyon/pywhisper |
Beta Was this translation helpful? Give feedback.
-
There are also several Python libraries that can convert |
Beta Was this translation helpful? Give feedback.
-
It seems there is a function for it: See https://github.com/openai/whisper/blob/main/whisper/utils.py |
Beta Was this translation helpful? Give feedback.
-
@veirant @ndeville now srt is exported by default: Line 310 in 2037b65 |
Beta Was this translation helpful? Give feedback.
-
I also added CSV output with timestamps in milliseconds.... #228 because that's the format I want my transcriptions in for #233 |
Beta Was this translation helpful? Give feedback.
-
so two days i did an experiment and generated some transcripts of my podcast using openai/whisper (and the pywhisper wrapper mentioned above by @fcakyon I uploaded two episodes of my srt files and they didn't work..examining the files closely and the timestamps don't seem to have the proper number of digits. Here is an examaple: 799 that is an hour and 8 minutes into the show (at the end). the hosting platform (captivate) is rejecting the files because it shoudl be: 799 Hope that makes sense..it's not a 24 hour format. Is there something wrong with my setup or is this a bug? |
Beta Was this translation helpful? Give feedback.
-
I'm wondering if there's an easy way to split up the .srt output into smaller sections? Or is this a function of the 30 second windowing & N_FRAMES etc? Would I run into issues trying to edit the hyperparameters in audio.py because I would change it away from how the dataset was trained? Eg. Whisper has output a srt:
and a human translator gave us smaller sections (which I prefer for this project):
Any suggestions appreciated :) |
Beta Was this translation helpful? Give feedback.
-
I believe I've run the same recording multiple times and on one instance
seen the transcription broken into short sections, as mentioned is most
desired, and another time in longer phrases, which I agree are not ideal.
I haven't reported this as a bug because I haven't gone back to verify the
issue, but that was my memory. I am hoping the "smaller sections" --
something short enough that will be legible on-screen for useful subtitles
-- will be coming eventually.
…On Mon, Nov 28, 2022 at 3:54 PM Ian Berman ***@***.***> wrote:
I'm wondering if there's an easy way to split up the .srt output into
smaller sections? Or is this a function of the 30 second windowing &
N_FRAMES etc?
Would I run into issues trying to edit the hyperparameters in audio.py
because I would change it away from how the dataset was trained?
Eg. Whisper has output a srt:
00:02:48,000 --> 00:02:59,000
Yeah, but we went against it because we dropped the dollarization. Imagine, we would have had at least, what would have been like a dollar per ticket?
and a human translator gave us smaller sections (which I prefer for this
project):
25
00:02:48,265 --> 00:02:49,972
Yeah, but we lost money because of
26
00:02:50,012 --> 00:02:51,700
the whole “dollarization”
27
00:02:51,740 --> 00:02:55,800
Imagine, we would have had at least
28
00:02:56,879 --> 00:02:59,150
What, like one dollar per ticket?
Any suggestions appreciated :)
—
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3TATOL4VHMOMED7VDFZQUDWKUSZ3ANCNFSM6AAAAAAQUUCPRM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I am using google colab to summarize my uni notes, it works great, but there is a problem, when I try to process the following file with python, the memory is not released and the following file fails, i think the same for my RTX 3080. The same is not the case with the command line. |
Beta Was this translation helpful? Give feedback.
-
So if there is long pauses in the dialog the last spoken sentence will stay as the current for several minutes which doesn't look good when added back to the video. Is there a way to have the srt stop to a blank segment if the dialog stops for more than 5 or 10 seconds (or something user configurable?) |
Beta Was this translation helpful? Give feedback.
-
I am using this library as a wrap around and getting good results |
Beta Was this translation helpful? Give feedback.
-
I wrote a script to translate the output of whisper to .vtt format (https://en.wikipedia.org/wiki/WebVTT) which is then loadable and viewable with VLC media player:
The script is not thoroughly tested so it may have bugs (I ran it once or twice ...) ... Usage:
|
Beta Was this translation helpful? Give feedback.
-
With the current Whisper release the easiest way is probably to use get_writer to generate SRT, WEBVTT, JSON, TSV and TXT files.
|
Beta Was this translation helpful? Give feedback.
-
I found an app of whisper on website replicate which you can upload the audio file and choose whatever the type of the output transcript (plain text, srt, vtt) ..........> check it : https://replicate.com/openai/whisper |
Beta Was this translation helpful? Give feedback.
-
Is there any way to make Whisper break text based on . , ? ! I'm a non-English speaker and I'm trying to use the translation API to create subtitles afterwards, but one sentence in English is so long that I want to break it in the middle, but if I do this in post-processing, the subtitles are out of sync or the timetable is out of sync. Essentially, it would be ideal to have Whisper generate the text with that criterion, but is there any way to fine-tune it like that? |
Beta Was this translation helpful? Give feedback.
-
Hi, I came across this thread. Is there a function to post process a json output into an SRT format? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Whisper creates .srt natively.
But if you need to convert json to srt look at OpenSubtitles as I suspect
it probably does. General Google search might reveal something also.
…On Mon, Oct 16, 2023 at 10:21 AM dgoryeo ***@***.***> wrote:
Hi, I came across this thread. Is there a function to post process a json
output into an SRT format? Thanks.
—
Reply to this email directly, view it on GitHub
<#98 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3TATOM3YHXKRTC3BKUGXI3X7VGJDAVCNFSM6AAAAAAQUUCPROVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TEOJUGAYDM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
kdenlive can use whisper to convert audio to subtitles , https://github.com/KDE/kdenlive/blob/master/data/scripts/whispertosrt.py you can use it out of box |
Beta Was this translation helpful? Give feedback.
-
%%time |
Beta Was this translation helpful? Give feedback.
-
I would like to know If whisper can make srt's ike this into one word per each :
I'm currently using AssemblyAI to transcribe and make the srt file but it doesn't do a good job in giving one word per each SRT's and I thought maybe I could switch to whisper if it had some workaround into doing this. |
Beta Was this translation helpful? Give feedback.
-
For Jyupiter notebook, ffmeg must of course be installed.
|
Beta Was this translation helpful? Give feedback.
-
In my personal opinion, 90% of all calls to the transcription tool will come from people doing subtitles - in theory, this can greatly facilitate the work, especially if an articulate fragment is taken for a sentence, which more or less resembles a complete thought. Actually, the question is: is it possible to somehow create subtitles using this projects? Are there forks capable of this? Are you going to add similar functionality? Thank you.
Beta Was this translation helpful? Give feedback.
All reactions