v1.1.0 #408
Replies: 9 comments 13 replies
-
Thanks for your great work! |
Beta Was this translation helpful? Give feedback.
-
This is awesome! In my testing so far, this new version has also not had any of the issues the previous version did of being stuck in a loop of one line or one word. One thing I've noticed is that there are dashes at the beginning of some of the lines of the output. I'm not sure if this is a change in the new version or just something odd with the model (I'm using tiny.en)
|
Beta Was this translation helpful? Give feedback.
-
@ggerganov Great work! Can you add upload artifact for other platforms apart from Windows. Can not seem to find pre-built binaries for non-windows platform under Actions. |
Beta Was this translation helpful? Give feedback.
-
Thank you Georgi! I have a small request to make the versioning correct for golang. Can you tag with a "v" at the front when making semantic tags/releases (for example, "v1.0.1" rather than "1.0.1")? I don't really know why, but golang expects this prefix, and it will help with the go documentation and module support when making updates. The source I found about this is here: But it's unclear to me exactly why golang needs the "v". |
Beta Was this translation helpful? Give feedback.
-
I'm running 1.10+ (as in the head of respository as of today) and re-ran transcription on two videos that previously got stuck in a loop of repeating words and then non-output for the majority of the transcriopt. The original python implementation of Whisper did not have these problems. Now, as of the 1.10 version, these videos now transcribe successfully. Original sources:
I'm also noticing some interesting improvements over the original whisper, e.g. tagging sounds: (orig source: https://rumble.com/v25zs54-why-can-we-still-not-talk-about-natural-immunity-060-stay-free-with-russell.html ... output format from my -ocsv extension on 'main'):
(orig source https://www.youtube.com/watch?v=ZMUKa2kWtTk )
(orig source: https://rumble.com/v25eg37-title-more-on-brazils-radical-censorship-escalation-after-show-q-and-a-on-l.html )
|
Beta Was this translation helpful? Give feedback.
-
wrt @NielsMayer
|
Beta Was this translation helpful? Give feedback.
-
I just want to say: I've processed over 100 podcasts so far using the 1.1.0 beta and NOT ONCE have I had the "repeating line" issue. (btw: it's using ggml-medium.en.bin) Thanks so much - as that was the reason I've had to revert to using the whisper python implementation! Thanks again, @ggerganov! |
Beta Was this translation helpful? Give feedback.
-
I have been using version 1.1.1 since yesterday (previously 1.1.0 Beta) and unfortunately there are many repetitions again. I have now read that with this version a temperature fallback is set to -1. |
Beta Was this translation helpful? Give feedback.
-
Overview
The major change in this pre-release is the improved decoding implementation in
whisper.cpp
:T > 0
best_of
parameter forT > 0
beam_size
)More information about the decoding changes can be found in #291
Additionally, there are a few performance improvements for Apple Silicon, WASM and non-F16C CPUs
Support for POWER9 architectures has been added.
The reason that this is a pre-release and not an official release is that the new implementation has not been sufficiently tested yet and the existing bindings for other languages have not been updated to support the API changes. The official release
v1.1.x
will be created when there is enough feedback about the new decoding implementation and when the bindings have been updated. So make sure to send your feedback in the discussion created for this pre-release. For now, thev1.0.4
release should be considered more stable.What's Changed
Core
ggml
/whisper
ggml
: POWER9 support by @fitzsim in ggml : add f16 acceleration for POWER9 ppc64le #320, ggml : improve f16 acceleration for POWER9 ppc64le #349, Reorganize POWER9 SIMD code #369ggml
: simplify the SIMD code by @ggerganov in Simplify the SIMD code #324ggml
: add SSE3 and fp16 conversion lookup table by @abitofevrything in Add SSE3 and fp16 conversion lookup table #368ggml
: utilise Accelerate's vDSP for some computations d51fc3eggml
: speed-up softmax compute via Accelerate and loop unrolling d61d55cggml
: do not start extra threads when using BLAS d347a59whisper
: do sample_to_timestamp calculation with 64 bit precision to avoid overflow by @boolemancer in Do sample_to_timestamp calculation with 64 bit precision to avoid overflow #388whisper
: various code clean-up and improvements by @asmaloney in ggml: Make consts static #317 whisper: Fix mem leak on failure to load model #318 whisper: Use emplace_back in place of push_back #319 examples: small code cleanups #322 etcwhisper
: improve decoding by @ggerganov in Improve decoding #291whisper
: account for speed_up flag for short audio Short voice be skipped in speed_up mode #405C-style API
whisper_token_data::plog
whisper_init_from_file()
whisper_init_from_buffer()
whisper_init()
whisper_sample_best()
whisper_sample_timestamp()
whisper_n_audio_ctx()
whisper_get_logits()
whisper_get_probs()
struct whisper_full_params
Bindings
Examples
whisper.android
: remove android ABI constraint by @Digipom in Remove android abi constraint #301whisper.swiftui
: SwiftUI example by @Digipom in Whisper.swiftui #308main
: add-ocsv
, aka--output-csv
for writing CSV file containing millisecond timestamps by @NielsMayer in Similar to Whisper PR#228, this adds -ocsv, aka --output-csv, writing CSV file containing millisecond timestamps #340command
: refactor to split command list & general transcription modes by @asmaloney in command: Refactor to split command list & general transcription modes #331command
: always-prompt mode by @dnhkng in Command: always test the prompt #383stream
: fix data race on bool + avoid division-by-zero a466c34stream
: fix a bug that inserted a lot of empty audio at the start a6dbd91bench.wasm
: print system info fafd789New Contributors
go mod tidy
before building examples #296Full Changelog: v1.0.4...v1.1.0
Highlights
This discussion was created from the release v1.1.0.
Beta Was this translation helpful? Give feedback.
All reactions