Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cloud-based Whisper support for transcription #111

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

psmoros
Copy link

@psmoros psmoros commented Mar 9, 2025

This PR adds support for OpenAI's cloud-based Whisper API as an alternative to the local Whisper model. This enhancement provides several benefits:

Key Features:

  • Cloud-based Whisper integration via OpenAI's API
  • Accurate word-level timestamp generation for better animation synchronization
  • Compatible with ARM64 architectures (e.g., Apple Silicon)
  • No need to install large local Whisper models

Implementation Details:

  • Added use_cloud_whisper flag to enable cloud-based transcription
  • Created custom entry point script (manim_cloud_whisper.py) for easy usage
  • Updated SpeechService class to handle cloud-based transcription
  • Added environment variable support for configuration
  • Improved word boundary processing for accurate timing

Usage:
Users can enable cloud-based Whisper in three ways:

  1. Using the custom entry point: python manim_cloud_whisper.py
  2. Setting environment variable: MANIM_VOICEOVER_USE_CLOUD_WHISPER=1
  3. Programmatically: use_cloud_whisper=True in OpenAIService

Example:
Added a comprehensive linear regression demo showcasing the cloud-based Whisper functionality with synchronized voiceovers and animations.

Note: Requires OpenAI API key to be set in environment variables.

@psmoros psmoros requested a review from osolmaz as a code owner March 9, 2025 18:27
@psmoros psmoros closed this Mar 9, 2025
@psmoros psmoros reopened this Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant