Releases: ggerganov/whisper.cpp
v1.7.1
Overview
- Fix Vulkan crashes
- Performance stats for Vulkan on RTX 2060
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_1 | 1 | 0 | 141.03 | 4.63 | 4.18 | 0.20 | 9f346d0 |
RTX 2060 | VULKAN | medium | 1 | 0 | 472.66 | 7.55 | 11.35 | 0.56 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_0 | 1 | 0 | 395.55 | 9.81 | 10.64 | 0.49 | 9f346d0 |
RTX 2060 | VULKAN | medium-q5_1 | 1 | 0 | 398.85 | 10.16 | 10.15 | 0.50 | 9f346d0 |
RTX 2060 | VULKAN | medium-dis | 1 | 0 | 427.26 | 1.26 | 1.20 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | large-v2 | 1 | 0 | 924.60 | 12.36 | 18.56 | 1.01 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_0 | 1 | 0 | 774.21 | 17.25 | 17.17 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-q5_1 | 1 | 0 | 779.75 | 17.44 | 16.27 | 0.85 | 9f346d0 |
RTX 2060 | VULKAN | large-v2-dis | 1 | 0 | 833.35 | 1.38 | 1.56 | 0.10 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo | 1 | 0 | 839.90 | 2.11 | 2.70 | 0.16 | 9f346d0 |
RTX 2060 | VULKAN | large-v3-turbo-q5_0 | 1 | 0 | 705.49 | 3.22 | 2.53 | 0.14 | 9f346d0 |
What's Changed
- Retry allocation with fallback flags by @SRHMorris in #2451
New Contributors
- @SRHMorris made their first contribution in #2451
Full Changelog: v1.7.0...v1.7.1
Binaries
https://github.com/ggerganov/whisper.cpp/actions/runs/11213279590
v1.7.0
Overview
- Fix crashes with high number of beams
- Reduce overal VRAM usage
- Optimize Encoder performance
Some performance numbers for this release:
M2 Ultra
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 1 | 8.37 | 1.44 | 0.48 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 1 | 9.81 | 1.46 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 1 | 8.80 | 1.47 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 1 | 16.11 | 1.96 | 0.74 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 1 | 16.38 | 1.99 | 0.78 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 1 | 16.72 | 2.00 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 1 | 41.26 | 3.88 | 1.66 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 1 | 46.91 | 4.02 | 1.76 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 1 | 47.05 | 4.00 | 1.73 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 1 | 111.29 | 7.79 | 3.63 | 0.11 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 1 | 129.78 | 7.71 | 3.85 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 1 | 129.29 | 7.71 | 3.87 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 1 | 99.27 | 1.09 | 0.43 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 1 | 198.81 | 11.54 | 5.59 | 0.20 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 1 | 236.18 | 11.12 | 6.11 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 1 | 235.88 | 11.14 | 6.01 | 0.24 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 1 | 177.41 | 1.21 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 1 | 178.92 | 1.89 | 0.83 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 1 | 211.44 | 1.73 | 0.90 | 0.04 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 Ultra | METAL | tiny | 1 | 0 | 10.04 | 1.37 | 0.50 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_0 | 1 | 0 | 10.02 | 1.36 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | tiny-q5_1 | 1 | 0 | 11.08 | 1.37 | 0.53 | 0.01 | 6a94163 |
M2 Ultra | METAL | base | 1 | 0 | 17.84 | 1.93 | 0.77 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_0 | 1 | 0 | 18.57 | 1.92 | 0.81 | 0.02 | 6a94163 |
M2 Ultra | METAL | base-q5_1 | 1 | 0 | 18.66 | 1.93 | 0.82 | 0.02 | 6a94163 |
M2 Ultra | METAL | small | 1 | 0 | 48.26 | 3.95 | 1.73 | 0.05 | 6a94163 |
M2 Ultra | METAL | small-q5_0 | 1 | 0 | 53.68 | 3.99 | 1.85 | 0.06 | 6a94163 |
M2 Ultra | METAL | small-q5_1 | 1 | 0 | 53.86 | 4.00 | 1.82 | 0.06 | 6a94163 |
M2 Ultra | METAL | medium | 1 | 0 | 130.09 | 8.01 | 3.82 | 0.13 | 6a94163 |
M2 Ultra | METAL | medium-q5_0 | 1 | 0 | 148.18 | 7.92 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-q5_1 | 1 | 0 | 147.95 | 7.94 | 4.11 | 0.14 | 6a94163 |
M2 Ultra | METAL | medium-dis | 1 | 0 | 116.97 | 1.11 | 0.42 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v2 | 1 | 0 | 232.43 | 12.34 | 5.87 | 0.22 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_0 | 1 | 0 | 269.72 | 11.68 | 6.44 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-q5_1 | 1 | 0 | 269.71 | 11.82 | 6.36 | 0.26 | 6a94163 |
M2 Ultra | METAL | large-v2-dis | 1 | 0 | 209.25 | 1.25 | 0.48 | 0.02 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo | 1 | 0 | 211.09 | 1.98 | 0.84 | 0.03 | 6a94163 |
M2 Ultra | METAL | large-v3-turbo-q5_0 | 1 | 0 | 244.23 | 1.81 | 0.92 | 0.04 | 6a94163 |
Ryzen 9 5950X + RTX 2060
Flash Attention ON:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 1 | 7.35 | 0.78 | 0.24 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 1 | 6.45 | 0.67 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 1 | 6.39 | 0.66 | 0.14 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 1 | 10.20 | 0.88 | 0.30 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 1 | 11.38 | 0.92 | 0.21 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 1 | 11.76 | 0.91 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 1 | 33.06 | 2.00 | 0.56 | 0.03 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 1 | 35.84 | 1.84 | 0.43 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 1 | 36.89 | 1.82 | 0.42 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 1 | 90.65 | 4.54 | 1.13 | 0.08 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 1 | 104.01 | 3.80 | 0.91 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 1 | 107.98 | 3.72 | 0.87 | 0.10 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 1 | 79.08 | 0.68 | 0.17 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 1 | 162.00 | 7.52 | 1.92 | 0.14 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 1 | 184.59 | 5.64 | 1.50 | 0.16 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 1 | 193.85 | 5.55 | 1.44 | 0.17 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 1 | 140.75 | 0.84 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 1 | 143.38 | 1.29 | 0.36 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 1 | 163.30 | 0.93 | 0.28 | 0.03 | 6a94163 |
Flash Attention OFF:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 1 | 0 | 12.49 | 0.87 | 0.23 | 0.01 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 1 | 0 | 10.65 | 0.78 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 1 | 0 | 10.82 | 0.77 | 0.19 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base | 1 | 0 | 18.97 | 1.04 | 0.34 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 1 | 0 | 20.22 | 1.09 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 1 | 0 | 20.48 | 1.07 | 0.27 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | small | 1 | 0 | 59.52 | 2.37 | 0.70 | 0.05 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 1 | 0 | 62.98 | 2.23 | 0.60 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 1 | 0 | 63.64 | 2.21 | 0.59 | 0.06 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium | 1 | 0 | 161.53 | 5.36 | 1.53 | 0.13 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 1 | 0 | 174.96 | 4.64 | 1.32 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 1 | 0 | 178.42 | 4.57 | 1.29 | 0.15 | 6a94163 |
RTX 2060 | AVX2 CUDA | medium-dis | 1 | 0 | 149.65 | 0.75 | 0.20 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2 | 1 | 0 | 280.55 | 8.74 | 2.51 | 0.23 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 1 | 0 | 306.87 | 6.92 | 2.08 | 0.25 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 1 | 0 | 314.25 | 6.82 | 2.02 | 0.26 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 1 | 0 | 259.39 | 0.91 | 0.37 | 0.02 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo | 1 | 0 | 261.83 | 1.44 | 0.41 | 0.04 | 6a94163 |
RTX 2060 | AVX2 CUDA | large-v3-turbo-q5_0 | 1 | 0 | 282.99 | 1.09 | 0.33 | 0.04 | 6a94163 |
Vulkan:
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | VULKAN | tiny | 1 | 0 | 30.38 | 1.37 | 1.04 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_0 | 1 | 0 | 20.98 | 1.38 | 0.99 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | tiny-q5_1 | 1 | 0 | 20.74 | 1.30 | 0.96 | 0.05 | 9f346d0 |
RTX 2060 | VULKAN | base | 1 | 0 | 44.69 | 1.59 | 1.78 | 0.09 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_0 | 1 | 0 | 39.72 | 2.11 | 1.72 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | base-q5_1 | 1 | 0 | 39.45 | 2.01 | 1.63 | 0.08 | 9f346d0 |
RTX 2060 | VULKAN | small | 1 | 0 | 160.02 | 3.53 | 4.64 | 0.23 | 9f346d0 |
RTX 2060 | VULKAN | small-q5_0 | 1 | 0 | 141.52 | 4.54 | 4.44 | 0.20 | 9f346d0 |
RTX 2060 | VULKA... |
v1.6.2
Overview
Bugfix when using multiple whisper_state
in parallel: #2182
What's Changed
- Update ruby bindings by @taf2 in #2154
- Update server.cpp by @dvaldivia in #2181
- Revert "whisper : remove extra backend instance (huh?)" by @ggerganov in #2182
New Contributors
- @dvaldivia made their first contribution in #2181
Full Changelog: v1.6.1...v1.6.2
v1.6.1
Minor release adding initial ffmpeg support in the examples #2133 (thx @WilliamTambellini)
What's Changed
- ci: Update build.yml to suppress warnings about node.js versions by @tamo in #2166
- node : add flash_attn param by @pprobst in #2170
- Add support for decoding input with ffmpeg (Linux) by @WilliamTambellini in #2133
New Contributors
- @WilliamTambellini made their first contribution in #2133
Full Changelog: v1.6.0...v1.6.1
v1.6.0
Overview
- Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152)
- Faster ppc64 performance (40aeeee) (not tested)
- Fix
main
slowdown bug (#2070)
Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels
Some performance numbers for this release:
M1 Pro
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M1 Pro | METAL | tiny | 1 | 0 | 39.21 | 1.74 | 0.61 | 0.04 | 22c96b4 |
M1 Pro | METAL | base | 1 | 0 | 70.76 | 2.60 | 0.93 | 0.06 | 22c96b4 |
M1 Pro | METAL | small | 1 | 0 | 217.28 | 6.42 | 2.14 | 0.17 | 22c96b4 |
M1 Pro | METAL | medium | 1 | 0 | 596.74 | 14.43 | 4.75 | 0.45 | 22c96b4 |
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M1 Pro | METAL | tiny | 1 | 1 | 30.77 | 1.59 | 0.54 | 0.03 | 22c96b4 |
M1 Pro | METAL | base | 1 | 1 | 60.42 | 2.29 | 0.81 | 0.05 | 22c96b4 |
M1 Pro | METAL | small | 1 | 1 | 183.82 | 5.12 | 1.81 | 0.14 | 22c96b4 |
M1 Pro | METAL | medium | 1 | 1 | 517.92 | 11.60 | 4.01 | 0.38 | 22c96b4 |
M2 Ultra
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 0 | 12.32 | 1.35 | 0.49 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 0 | 11.65 | 1.30 | 0.51 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 0 | 12.08 | 1.30 | 0.51 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | base | 1 | 0 | 17.58 | 1.90 | 0.76 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | base-q5_0 | 1 | 0 | 18.89 | 1.86 | 0.79 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | base-q5_1 | 1 | 0 | 20.69 | 1.88 | 0.79 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | small | 1 | 0 | 49.32 | 3.85 | 1.71 | 0.05 | 22c96b4 |
M2 ULTRA | METAL | small-q5_0 | 1 | 0 | 54.91 | 3.81 | 1.82 | 0.06 | 22c96b4 |
M2 ULTRA | METAL | small-q5_1 | 1 | 0 | 54.92 | 3.81 | 1.79 | 0.06 | 22c96b4 |
M2 ULTRA | METAL | medium | 1 | 0 | 134.34 | 8.04 | 3.82 | 0.13 | 22c96b4 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 0 | 151.68 | 7.59 | 4.07 | 0.14 | 22c96b4 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 0 | 151.58 | 7.67 | 4.07 | 0.14 | 22c96b4 |
M2 ULTRA | METAL | medium-dis | 1 | 0 | 120.82 | 1.07 | 0.41 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | large-v2 | 1 | 0 | 235.63 | 12.27 | 5.85 | 0.22 | 22c96b4 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 0 | 273.38 | 11.17 | 6.40 | 0.26 | 22c96b4 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 0 | 272.44 | 11.32 | 6.29 | 0.26 | 22c96b4 |
M2 ULTRA | METAL | large-v2-dis | 1 | 0 | 212.51 | 1.20 | 0.47 | 0.02 | 22c96b4 |
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
M2 ULTRA | METAL | tiny | 1 | 1 | 9.07 | 1.33 | 0.45 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | tiny-q5_0 | 1 | 1 | 9.74 | 1.33 | 0.47 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | tiny-q5_1 | 1 | 1 | 8.93 | 1.31 | 0.46 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | base | 1 | 1 | 15.75 | 1.87 | 0.71 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | base-q5_0 | 1 | 1 | 17.04 | 1.83 | 0.74 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | base-q5_1 | 1 | 1 | 17.17 | 1.83 | 0.74 | 0.02 | 22c96b4 |
M2 ULTRA | METAL | small | 1 | 1 | 42.33 | 3.64 | 1.60 | 0.05 | 22c96b4 |
M2 ULTRA | METAL | small-q5_0 | 1 | 1 | 47.61 | 3.63 | 1.70 | 0.05 | 22c96b4 |
M2 ULTRA | METAL | small-q5_1 | 1 | 1 | 47.70 | 3.66 | 1.68 | 0.05 | 22c96b4 |
M2 ULTRA | METAL | medium | 1 | 1 | 114.42 | 7.53 | 3.55 | 0.11 | 22c96b4 |
M2 ULTRA | METAL | medium-q5_0 | 1 | 1 | 132.63 | 7.02 | 3.77 | 0.13 | 22c96b4 |
M2 ULTRA | METAL | medium-q5_1 | 1 | 1 | 132.28 | 7.10 | 3.76 | 0.13 | 22c96b4 |
M2 ULTRA | METAL | medium-dis | 1 | 1 | 102.34 | 1.01 | 0.42 | 0.01 | 22c96b4 |
M2 ULTRA | METAL | large-v2 | 1 | 1 | 203.01 | 11.03 | 5.45 | 0.20 | 22c96b4 |
M2 ULTRA | METAL | large-v2-q5_0 | 1 | 1 | 240.05 | 10.18 | 5.98 | 0.23 | 22c96b4 |
M2 ULTRA | METAL | large-v2-q5_1 | 1 | 1 | 239.22 | 10.23 | 5.87 | 0.23 | 22c96b4 |
M2 ULTRA | METAL | large-v2-dis | 1 | 1 | 181.14 | 1.14 | 0.48 | 0.02 | 22c96b4 |
Ryzen 9 5950X + RTX 2060
CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
Ryzen 9 5950X | AVX2 | tiny | 8 | 0 | 195.29 | 1.57 | 0.51 | 0.26 | 22c96b4 |
Ryzen 9 5950X | AVX2 | tiny-q5_0 | 8 | 0 | 213.33 | 1.10 | 0.50 | 0.30 | 22c96b4 |
Ryzen 9 5950X | AVX2 | tiny-q5_1 | 8 | 0 | 219.38 | 1.18 | 0.53 | 0.32 | 22c96b4 |
Ryzen 9 5950X | AVX2 | base | 8 | 0 | 424.85 | 3.71 | 1.03 | 0.46 | 22c96b4 |
Ryzen 9 5950X | AVX2 | base-q5_0 | 8 | 0 | 473.61 | 1.81 | 0.82 | 0.52 | 22c96b4 |
Ryzen 9 5950X | AVX2 | base-q5_1 | 8 | 0 | 484.14 | 1.92 | 0.85 | 0.56 | 22c96b4 |
Ryzen 9 5950X | AVX2 | small | 8 | 0 | 1458.32 | 12.66 | 3.09 | 1.26 | 22c96b4 |
Ryzen 9 5950X | AVX2 | small-q5_0 | 8 | 0 | 1673.22 | 6.42 | 2.18 | 1.45 | 22c96b4 |
Ryzen 9 5950X | AVX2 | small-q5_1 | 8 | 0 | 1724.78 | 6.72 | 2.32 | 1.52 | 22c96b4 |
Ryzen 9 5950X | AVX2 | medium | 8 | 0 | 4333.87 | 36.80 | 8.56 | 3.37 | 22c96b4 |
Ryzen 9 5950X | AVX2 | medium-q5_0 | 8 | 0 | 5194.09 | 19.21 | 5.71 | 3.97 | 22c96b4 |
Ryzen 9 5950X | AVX2 | medium-q5_1 | 8 | 0 | 5450.39 | 20.01 | 5.99 | 4.17 | 22c96b4 |
Ryzen 9 5950X | AVX2 | medium-dis | 8 | 0 | 3995.19 | 5.08 | 1.21 | 0.55 | 22c96b4 |
Ryzen 9 5950X | AVX2 | large-v2 | 8 | 0 | 8056.16 | 69.74 | 16.11 | 6.13 | 22c96b4 |
Ryzen 9 5950X | AVX2 | large-v2-q5_0 | 8 | 0 | 9799.58 | 35.16 | 10.49 | 7.28 | 22c96b4 |
Ryzen 9 5950X | AVX2 | large-v2-q5_1 | 8 | 0 | ms | 36.74 | 11.02 | 7.65 | 22c96b4 |
Ryzen 9 5950X | AVX2 | large-v2-dis | 8 | 0 | 7490.03 | 7.40 | 1.70 | 0.72 | 22c96b4 |
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 8 | 0 | 12.54 | 0.93 | 0.29 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 8 | 0 | 12.73 | 0.98 | 0.24 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 8 | 0 | 12.72 | 0.99 | 0.24 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | base | 8 | 0 | 24.14 | 1.28 | 0.41 | 0.03 | 22c96b4 |
RTX 2060 | AVX2 CUDA | base-q5_0 | 8 | 0 | 24.58 | 1.38 | 0.35 | 0.03 | 22c96b4 |
RTX 2060 | AVX2 CUDA | base-q5_1 | 8 | 0 | 24.58 | 1.37 | 0.35 | 0.03 | 22c96b4 |
RTX 2060 | AVX2 CUDA | small | 8 | 0 | 74.70 | 2.91 | 0.84 | 0.07 | 22c96b4 |
RTX 2060 | AVX2 CUDA | small-q5_0 | 8 | 0 | 76.12 | 2.84 | 0.77 | 0.08 | 22c96b4 |
RTX 2060 | AVX2 CUDA | small-q5_1 | 8 | 0 | 76.14 | 2.84 | 0.76 | 0.08 | 22c96b4 |
RTX 2060 | AVX2 CUDA | medium | 8 | 0 | 200.69 | 6.46 | 1.83 | 0.17 | 22c96b4 |
RTX 2060 | AVX2 CUDA | medium-q5_0 | 8 | 0 | 204.80 | 5.90 | 1.65 | 0.19 | 22c96b4 |
RTX 2060 | AVX2 CUDA | medium-q5_1 | 8 | 0 | 205.61 | 5.85 | 1.61 | 0.19 | 22c96b4 |
RTX 2060 | AVX2 CUDA | medium-dis | 8 | 0 | 186.17 | 0.86 | 0.24 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | large-v2 | 8 | 0 | 347.22 | 10.36 | 2.82 | 0.29 | 22c96b4 |
RTX 2060 | AVX2 CUDA | large-v2-q5_0 | 8 | 0 | 357.06 | 8.81 | 2.58 | 0.34 | 22c96b4 |
RTX 2060 | AVX2 CUDA | large-v2-q5_1 | 8 | 0 | 356.97 | 8.62 | 2.49 | 0.33 | 22c96b4 |
RTX 2060 | AVX2 CUDA | large-v2-dis | 8 | 0 | 318.05 | 1.03 | 0.34 | 0.04 | 22c96b4 |
GPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit |
---|---|---|---|---|---|---|---|---|---|
RTX 2060 | AVX2 CUDA | tiny | 8 | 1 | 7.21 | 0.76 | 0.29 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | tiny-q5_0 | 8 | 1 | 7.42 | 0.82 | 0.18 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | tiny-q5_1 | 8 | 1 | 7.38 | 0.82 | 0.18 | 0.02 | 22c96b4 |
RTX 2060 | AVX2 CUDA | ... |
v1.5.5
Overview
Many small incremental updates + Token level timestamps with DTW by @denersc in #1485
Feedback is welcome!
Full Changelog: v1.5.4...v1.5.5
What's Changed
- server : fix server temperature + add temperature_inc by @ggerganov in #1729
- main : add cli option to disable system prints by @ggerganov in #1740
- server: add request path by @eschmidbauer in #1741
- Optional Piper TTS support for talk-llama example. by @RhinoDevel in #1749
- fix/1748 by @nank1ro in #1750
- Don't compute timestamps when not printing them. by @ghindle in #1755
- Add more parameters to server api by @ghindle in #1754
- Add SetInitialPrompt method to go bindings by @blib in #1753
- ggml : fix 32-bit ARM compat for IQ2_XS by @ggerganov in #1758
- refactor: get all scripts to be POSIX Compliant by @sonphantrung in #1725
- whisper : load the model into multiple buffers of max size 1GB by @ggerganov in #1763
- rebase against your -np changes (thx) and add better python file to be used on the command line or as library by @contractorwolf in #1744
- examples/talk-llama: Add optional commandline parameter to set the bot name. by @RhinoDevel in #1764
- server : fix building and simplify lib deps on Windows by @przemoc in #1772
- talk-llama: optional wake-up command and audio confirmation by @Rakksor in #1765
- examples/server: implement "verbose_json" format with token details by @rmmh in #1781
- whisper.android: Return output from benchmarks by @luciferous in #1785
- libwhisper.so should be position independent by @trixirt in #1792
- Docs: try to make model options / model install methods clearer by @mrienstra in #1806
- common : fix input buffer check by @ggerganov in #1812
- Update Makefile by @jwijffels in #1813
- Add fields to
verbose_json
response and show examples on the home page by @JacobLinCool in #1802 - common: fix wav buffer detection by @JacobLinCool in #1819
- Add macOS deployment target option to Makefile by @didzis in #1839
- Expose CUDA device setting in public API by @didzis in #1840
- whisper.android: How to build with CLBlast by @luciferous in #1809
- server: Allow CORS request with authorization headers by @valenting in #1850
- Embed Metal library source into compiled binary by @didzis in #1842
- added audio_ctx argument to main and server examples by @dscripka in #1857
- whisper : fix external encoder by @ggerganov in #1860
- swift : package no longer use ggml dependency by @ggerganov in #1861
- fix openvino setup docs by @jumpers775 in #1874
- clean up common code in examples by @felrock in #1871
- main : check if input files exist before proceeding by @Theldus in #1872
- Linking issue fix via Makefile when CUBLAS enabled in the WSL #1876 by @lbluep in #1878
- main : fix file existence check in main.cpp by @Theldus in #1889
- openvino : fix convert-whisper-to-openvino.py for v2023.0.0 (#1870) by @st-gr in #1890
- ggml : 32-bit arm compat by @ggerganov in #1891
- Add SYCL logic in whisper by @abhilash1910 in #1863
- talk and talk-llama: Pass text_to_speak as a file by @tamo in #1865
- Stream.wasm: Fix invalid memory access when no segments are returned by @Andrews54757 in #1902
- Update README to Recommend MacOS Sonoma for Core ML to avoid hallucination by @gavin1818 in #1917
- Add library versioning by @kenneth-ge in #1352
- Fix SF(segment fault) issue in Android JNI by @zhouwg in #1929
- Fix typo in source file whisper.cpp by @zhouwg in #1925
- bench:fix typo by @zhouwg in #1933
- Auto lowercase language parameter by @F1L1Pv2 in #1928
- ggml : try fix 32-bit arm compat by @ggerganov in #1938
- whisper : make beam candidate sort more stable by @josharian in #1943
- bindings/go : add linker flags to make metal work by @josharian in #1944
- whisper : improve beam search candidate diversity by @josharian in #1947
- whisper : document whisper_batch.n_seq_id by @josharian in #1942
- Rename --audio-context to --audio-ctx, as per help text by @joliss in #1953
- [DRAFT] Token level timestamps with DTW (#375) by @denersc in #1485
- Fedora dependencies needed (SDL2) by @Man2Dev in #1970
- libcuda.so.1 in PATH in Docker Container by @tiagofassoni in #1966
- ruby : fix build by @ggerganov in #1980
- Improve support for distil-large-v3 by @sanchit-gandhi in #1982
- whisper : improve handling of prompts by @ggerganov in #1981
- sync : ggml by @ggerganov in #2001
- Implemented command-style grammar in the main example. by @ulatekh in #1998
- Use pkg-config for OpenBLAS by @przemoc in #1778
- ci : add building in MSYS2 environments (Windows) by @przemoc in #1994
- Support CUDA versions < 11.1 by @primenko-v in #2020
- Create solution folders in the CMake build by @ulatekh in #2004
- Allow a regular expression to describe tokens to suppress by @ulatekh in #1997
- "main" example now allows a response-file as the sole parameter by @ulatekh in #2019
- Support for CPU BLAS build via Intel MKL by @slashlib in #2024
- Set stdin to binary mode on Windows. Fixes #2023 by @rotemdan in #2025
- Fix file-handle leak in read_wav() by @ulatekh in #2026
- Fix DTW memory access by @bradmurray-dt in #2012
- whisper: update grammar-parser.cpp by @eltociear in #2058
- fix missing reference to "model" variable in actual shell command run in whisper.nvim by @sixcircuit in #2049
- build : detect AVX512 in Makefile, add AVX512 option in CMake by @didzis in #2043
- feature/no timestamps node by @pprobst in #2048
- Update embedded Metal library generation process to include dependency by @didzis in #2045
- server.cpp: add dtw by @eschmidbauer in #2044
New Contributors
- @eschmidbauer made their first contribution in #1741
- @RhinoDevel made their first contribution in #1749
- @nank1ro made their first contribution in #1750
- @ghindle made their first contribution in #1755
- @blib made their first contribution in #1753
- @sonphantrung made their first contribution in #1725
- @contractorwolf made their first contribution in #1744
- @Rakksor made their first contribution in #1765
- @rmmh made their f...
v1.5.4
v1.5.3
Overview
Minor maintenance release:
- Fix CUDA issues where the transcription produces garbage
- FIX quantized models to work with CUDA backend
- Allow to use
whisper.cpp
andllama.cpp
together in SwiftUI projects
What's Changed
- Update bench.py by @ForkedInTime in #1655
- cmake : Resolve quantized model issue when CUBLAS enabled by @bobqianic in #1667
- examples : Revert CMakeLists.txt for talk-llama by @bobqianic in #1669
- CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 by @bobqianic in #1672
- ci: build and push docker image by @OpenWaygate in #1674
- sync : ggml (ggml_scale, ggml_row_size, etc.) by @ggerganov in #1677
- Replace
WHISPER_PRINT_DEBUG
withWHISPER_LOG_DEBUG
by @bobqianic in #1681 - download: Fix large q5 model name by @dimopep in #1695
- sync : ggml (VMM, sync-ggml-am.sh, dotprod ARM fixes) by @ggerganov in #1691
- whisper : replace
tensor->n_dims
withggml_n_dims(tensor)
by @bobqianic in #1694 - Build with CLBlast by @tamo in #1576
- docker : Fix the Publishing of the CUDA Docker Image by @bobqianic in #1704
- emscripten: fix "Stack Overflow!" by @Huguet57 in #1713
- sync : ggml by @ggerganov in #1717
- Add error handling to graph_compute by @finnvoor in #1714
- Updates Package.swift to use ggml as package dependency by @1-ashraful-islam in #1701
New Contributors
- @ForkedInTime made their first contribution in #1655
- @OpenWaygate made their first contribution in #1674
- @dimopep made their first contribution in #1695
- @Huguet57 made their first contribution in #1713
- @1-ashraful-islam made their first contribution in #1701
Full Changelog: v1.5.2...v1.5.3
v1.5.2
Overview
Minor maintenance release:
- Re-enable CPU BLAS processing after fixing a regression (#1583)
Add new example: wchess
wchess-0.mp4
Shoutout to @fraxy-v (implementation) and @ejones (grammar) for making it work!
What's Changed
- automatically convert audio on the server by @sapoepsilon in #1539
- CI : Rectify the Clang-Related workflow issues by @bobqianic in #1551
- CI : Add CUDA 11.8.0 support by @bobqianic in #1554
- Update main program help info by @bebound in #1560
- Set default CORS headers to allow all by @kasumi-1 in #1567
- cmake : install required ggml.h header by @gjasny in #1568
- Backport .srt output format to examples/server by @osdrv in #1565
- Added support for .vtt format to Whisper server by @aleksanderandrzejewski in #1578
- ggml : re-enable blas for src0 != F32 by @ggerganov in #1583
- Fix 32-bit compiler warning by @Digipom in #1575
- Remove #if arch(arm) check in Swift Package Manager by @finnvoor in #1561
- Pass max-len argument to server wparams by @osdrv in #1574
- sync : ggml (new ops, new backend, etc) by @ggerganov in #1602
- Fix
ggml_metal_log
on Intel macs by @finnvoor in #1606 - Update CMakeLists.txt by @Kreijstal in #1615
- target windows 8 or above for prefetchVirtualMemory in llama-talk by @Kreijstal in #1617
- sync : ggml (Metal fixes, new ops, tests) by @ggerganov in #1633
- wchess: whisper assisted chess by @fraxy-v in #1595
New Contributors
- @sapoepsilon made their first contribution in #1539
- @bebound made their first contribution in #1560
- @kasumi-1 made their first contribution in #1567
- @gjasny made their first contribution in #1568
- @osdrv made their first contribution in #1565
- @aleksanderandrzejewski made their first contribution in #1578
- @Kreijstal made their first contribution in #1615
- @fraxy-v made their first contribution in #1595
Full Changelog: v1.5.1...v1.5.2
v1.5.1
Overview
Minor update:
- With Metal, auto-fallback to CPU if device does not support Apple7 family
- Add server example
What's Changed
- ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks by @spullara in #1364
- sync : ggml (ggml-alloc + linker + gguf fixes) by @ggerganov in #1501
- Fixed with_state methods, to use the correct state by @sandrohanea in #1519
- #1517 Redistribute CUDA DLLs by @tamo in #1522
- whisper : reuse whisper_decode_with_state by @ggerganov in #1521
- sdl : fix audio callback by @ggerganov in #1523
- update deprecated example by @MightyStud in #1529
- Super Simple Whisper Server by @felrock in #1380
- Close file after writing in server application by @felrock in #1533
- bench : multi-thread memcpy by @ggerganov in #1534
- Change temp file name for server application by @felrock in #1535
- Fixed Makefile for MacOS ARM 64 Go bindings by @gleicon in #1530
- Fixed metal build on macos-latest by @sandrohanea in #1544
- fix(server): typo in temperature parameter by @Okabintaro in #1545
- Request to add a new function to get the full language name by @bradmit in #1546
- server : add --print-realtime param by @ecneladis in #1541
- cuda : sync some minor stuff from llama.cpp by @ggerganov in #1548
- metal : add backend function to check device family support by @ggerganov in #1547
New Contributors
- @spullara made their first contribution in #1364
- @MightyStud made their first contribution in #1529
- @felrock made their first contribution in #1380
- @gleicon made their first contribution in #1530
- @Okabintaro made their first contribution in #1545
- @bradmit made their first contribution in #1546
- @ecneladis made their first contribution in #1541
Full Changelog: v1.5.0...v1.5.1