Skip to content

v24.10

Latest
Compare
Choose a tag to compare
@ddennedy ddennedy released this 30 Oct 00:49
· 12 commits to master since this release

Speech to Text

Shotcut gets its first AI based on OpenAI's Whisper, courtesy of the whisper.cpp project.
This is available through Subtitles > Speech to Text menu item or button: Speech to Text icon.

  • Our builds include a basic model that has decent speed and accuracy but not a big size. (You can think of the model as the brain.)
  • You can download a bigger and better better brain (model) in ggml format and configure it in the Speech to Text dialog, but it will be slower.
  • The dialog creates two jobs that appear in the Jobs panel: one to export audio and another to convert to text.
  • The results are added to the Subtitles panel as a new top-level Subtitle Track.
  • Currently, the only GPU our build supports is Apple Silicon. Otherwise, it is heavily multi-threaded on the CPU.
  • Known quirk: subtitle items sometimes start earlier than expected. Timing is provided by the model and tool, and we lack the skills and resources to improve this.
  • Expect there to be occasional errors. Like humans and non-ideal conditions, it is not perfect. We will not take action on bug reports about some piece of audio not converting to the expected text.
  • OpenAI has made some warnings about the usage of their Whisper models:

In particular, we caution against using Whisper models to transcribe recordings of individuals taken without their consent.... We recommend against use in high-risk domains like decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes.

Transition Improvements

  • Ripple Delete a transition restores the entirety of the clips included in the transition.
  • Lift (non-ripple delete) a transition no longer leaves a gap; the gap is filled with the adjacent clips.
  • Moving an adjacent clip away increases the transition duration instead of detaching and leaving a gap.

Other Improvements

  • Removed the Export > Video > Resample button. Now, there are simply ignorable inline warnings when making certain changes.
  • Added File > Show Project in Folder to menu.
  • Added a decimals <number> option to numeric keywords in the GPS Text video filter.
  • Changed Recent Projects to Projects: items in this view no longer disappear as Recent reaches its maximum length and old items are removed.
  • Added a Remove action to the context menu in Projects.
  • Hide the Reframe video filter and button if GPU Effects is on.
  • Upgraded FFmpeg to version 7.1.

Fixes

  • Fixed a crash doing when doing more than one Playlist > menu > Add Selected to Slideshow. In theory, this could fix other random crashes in Timeline.
  • Fixed a crash opening a project containing a subtitle track with no items.
  • Fixed odd value for computed width in Reframe output video filter causes export to fail.
  • Fixed Reframe visual control can create odd-valued dimensions.
  • Fixed AVCHD video frame rate is double (could fix other formats).
  • Fixed making a proxy video for a iPhone 16 Pro video containing spatial audio.
  • Fixed GPU filters paste below non-GPU filters.
  • Fixed Slideshow Generator dialog is too tall with vertical video mode.
  • Fixed GPS Offset would reset in GPS Text video filter.
  • Fixed the maximum allowed Time in the Time Remap filter to prevent white frames.