Add RecorderService (#23)

* Added RecorderService, PyAudio, Whisper * wip * Added silence trimming * Added word boundary detection with whisper STT * Changed voiceover dir structure * wip * Cosmetics * Updated docs * Minor
ManimCommunity · Nov 27, 2022 · a45d079 · a45d079
1 parent c3e0839
commit a45d079
Show file tree

Hide file tree

Showing 28 changed files with 1,614 additions and 839 deletions.
diff --git a/README.md b/README.md
@@ -11,17 +11,19 @@
 
 Manim Voiceover is a [Manim](https://manim.community) plugin for all things voiceover:
 
-- Add voiceovers to Manim videos _directly in Python_ without having to use a video editor.
-- Develop an animation with an auto-generated AI voice without having to re-record and re-sync the audio.
-- Record a voiceover and have it stitched back onto the video instantly. (Note that this is not the same as AI voice cloning)
+- Add voiceovers to Manim videos *directly in Python* without having to use a video editor.
+- Record voiceovers with your microphone during rendering with a simple command line interface.
+- Develop animations with auto-generated AI voices from various free and proprietary services.
+- Per-word timing of animations, i.e. trigger animations at specific words in the voiceover, even for the recordings. This works thanks to [OpenAI Whisper](https://github.com/openai/whisper).
 
 Here is a demo:
 
 https://user-images.githubusercontent.com/2453968/198145393-6a1bd709-4441-4821-8541-45d5f5e25be7.mp4
 
-Currently supported TTS services:
+Currently supported TTS services (aside from the CLI that allows you to records your own voice):
 
-- [Azure Text to Speech](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (Recommended)
+- [Azure Text to Speech](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (Recommended for AI voices)
+- [Coqui TTS](https://github.com/coqui-ai/TTS/)
 - [gTTS](https://github.com/pndurette/gTTS/)
 - [pyttsx3](https://github.com/nateshmbhat/pyttsx3)
 

diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -20,6 +20,10 @@ Speech services
    :members:
    :show-inheritance:
 
+.. automodule:: manim_voiceover.services.recorder
+   :members:
+   :show-inheritance:
+
 .. automodule:: manim_voiceover.services.azure
    :members:
    :show-inheritance:

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -9,8 +9,9 @@ Manim Voiceover
 `Manim Voiceover <https://voiceover.manim.community>`__ is a `Manim <https://manim.community>`__ plugin for all things voiceover:
 
 - Add voiceovers to Manim videos *directly in Python* without having to use a video editor.
-- Develop an animation with an auto-generated AI voice without having to re-record and re-sync the audio.
-- Record a voiceover and have it stitched back onto the video instantly. (Note that this is not the same as AI voice cloning)
+- Record voiceovers with your microphone during rendering with a simple command line interface (see :py:class:`~manim_voiceover.services.recorder.RecorderService`).
+- Develop animations with auto-generated AI voices from various free and proprietary services.
+- Per-word timing of animations, i.e. trigger animations at specific words in the voiceover, even for the recordings. This works thanks to `OpenAI Whisper <https://github.com/openai/whisper>`__.
 
 A demo:
 
@@ -24,5 +25,4 @@ A demo:
    quickstart
    services
    examples
-   .. changelog
    api
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -24,10 +24,40 @@ Translate API and therefore needs an internet connection to work. If it
 throws an error, there might be a problem with your internet connection
 or the Google Translate API.
 
+Installing PortAudio
+~~~~~~~~~~~~~~~~~~~~
+
+Manim Voiceover lets you record voiceovers during rendering using `PyAudio <https://people.csail.mit.edu/hubert/pyaudio/>`__.
+PyAudio depends on `PortAudio <http://www.portaudio.com/>`__ which needs to be installed separately.
+
+On Debian based distros:
+
+.. code:: sh
+
+   sudo apt install portaudio19-dev
+   sudo pip install pyaudio
+   # Or install from apt globally:
+   sudo apt install python3-pyaudio
+
+On macOS, you can install it using `Homebrew <https://brew.sh/>`__:
+
+.. code:: sh
+
+   brew install portaudio
+   pip install pyaudio
+
+On Windows, PortAudio should come prepackaged with the binaries, so just install PyAudio with pip:
+
+.. code:: sh
+
+   python -m pip install pyaudio
+
+For more information, see the `PyAudio documentation <https://people.csail.mit.edu/hubert/pyaudio/#downloads>`__.
+
 Installing SoX
 ~~~~~~~~~~~~~~
 
-``manim-voiceover`` can make the output from speech synthesizers faster
+Manim Voiceover can make the output from speech synthesizers faster
 or slower using `SoX <http://sox.sourceforge.net/>`__ (version 14.4.2 or
 higher is required).
 

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -99,4 +99,46 @@ Bookmarks allow you to trigger an animation at a specific word in the voiceover.
 
 With bookmarks, you can time your animations much more precisely. See the `bookmark example <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/bookmark-example.py>`__ and `Approximating Tau <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/approximating-tau.py>`__ for more examples.
 
+Record your own voiceover
+*************************
 
+Manim Voiceover can record your voiceover directly from the command line. We recommend the following workflow:
+
+1. Develop your animation with one of the text-to-speech engines, e.g. :py:class:`services.gtts.GTTSService`:
+
+.. code:: py
+
+   from manim_voiceover import VoiceoverScene
+   from manim_voiceover.services.gtts import GTTSService
+
+   class MyAwesomeScene(VoiceoverScene):
+       def construct(self):
+           self.set_speech_service(GTTSService())
+
+           with self.voiceover(text="This circle is drawn as I speak.") as tracker:
+               self.play(Create(circle))
+
+
+2. When you're happy with the animation, switch the service with :py:class:`services.recorder.RecorderService` to record your own voiceover:
+
+.. code:: py
+
+   from manim_voiceover import VoiceoverScene
+   # from manim_voiceover.services.gtts import GTTSService
+   from manim_voiceover.services.recorder import RecorderService
+
+   class MyAwesomeScene(VoiceoverScene):
+       def construct(self):
+           # self.set_speech_service(GTTSService())
+           self.set_speech_service(RecorderService())
+
+           with self.voiceover(text="This circle is drawn as I speak.") as tracker:
+               self.play(Create(circle))
+
+3. Render the scene the same way you would normally do:
+
+.. code:: sh
+
+   manim -pql my_awesome_scene.py --disable_caching
+
+This will instruct you in the terminal step by step what to do to record your voiceover.
diff --git a/docs/source/services.rst b/docs/source/services.rst
@@ -22,6 +22,11 @@ Manim Voiceover defines the :py:class:`~~base.SpeechService` class for adding ne
      - Can run offline?
      - Paid / requires an account?
      - Notes
+   * - :py:class:`~recorder.RecorderService`
+     - N/A
+     - N/A
+     - N/A
+     - This is a utility class to record your own voiceovers with a microphone.
    * - :py:class:`~azure.AzureService`
      - Very good, human-like
      - No
@@ -45,6 +50,20 @@ Manim Voiceover defines the :py:class:`~~base.SpeechService` class for adding ne
 
 It is on our roadmap to provide a high quality TTS engine that runs locally for free. If you have any suggestions, please let us know in the `Discord server <https://manim.community/discord>`__.
 
+:py:class:`~recorder.RecorderService`
+*************************************
+
+This is not a speech synthesizer but a utility class to record your own voiceovers with a microphone. It provides a command line interface to record voiceovers during rendering.
+
+Install Manim Voiceover with the ``recorder`` extra in order to use :py:class:`~recorder.RecorderService`:
+
+.. code:: sh
+
+   pip install "manim-voiceover[recorder]"
+
+Refer to the `example usage <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/recorder-example.py>`__ to get started.
+
+
 :py:class:`~azure.AzureService`
 *******************************
 

diff --git a/examples/bookmark-example.py b/examples/bookmark-example.py
@@ -1,19 +1,20 @@
 from manim import *
 from manim_voiceover import VoiceoverScene
 
-from manim_voiceover.services.coqui import CoquiService
-# from manim_voiceover.services.azure import AzureService
+# from manim_voiceover.services.coqui import CoquiService
+
+from manim_voiceover.services.azure import AzureService
 
 
 class BookmarkExample(VoiceoverScene):
     def construct(self):
-        self.set_speech_service(CoquiService())
-        # self.set_speech_service(
-        #     AzureService(
-        #         voice="en-US-AriaNeural",
-        #         style="newscast-casual",
-        #     )
-        # )
+        # self.set_speech_service(CoquiService())
+        self.set_speech_service(
+            AzureService(
+                voice="en-US-AriaNeural",
+                style="newscast-casual",
+            )
+        )
 
         blist = BulletedList(
             "Trigger animations", "At any word", "Bookmarks", font_size=64

diff --git a/examples/coqui-example.py b/examples/coqui-example.py
@@ -19,7 +19,9 @@ def construct(self):
         with self.voiceover(text="Now, let's transform it into a square.") as tracker:
             self.play(Transform(circle, square), run_time=tracker.duration)
 
-        with self.voiceover(text="This is a very very very very very very very very very very very very very very very very very long sentence."):
+        with self.voiceover(
+            text="This is a very very very very very very very very very very very very very very very very very long sentence."
+        ):
             pass
 
         with self.voiceover(text="Thank you for watching."):

diff --git a/examples/recorder-example.py b/examples/recorder-example.py
@@ -0,0 +1,25 @@
+from manim import *
+from manim_voiceover import VoiceoverScene
+from manim_voiceover.services.recorder import RecorderService
+
+
+class RecorderExample(VoiceoverScene):
+    def construct(self):
+        self.set_speech_service(RecorderService(silence_threshold=-40.0))
+
+        circle = Circle()
+        square = Square().shift(2 * RIGHT)
+
+        with self.voiceover(text="This circle is drawn as I speak.") as tracker:
+            self.play(Create(circle), run_time=tracker.duration)
+
+        with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
+            self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)
+
+        with self.voiceover(text="Now, let's transform it into a square.") as tracker:
+            self.play(Transform(circle, square), run_time=tracker.duration)
+
+        with self.voiceover(text="Thank you for watching."):
+            self.play(Uncreate(circle))
+
+        self.wait()
diff --git a/examples/voiceover-demo.py b/examples/voiceover-demo.py
@@ -4,12 +4,6 @@
 
 from manim_voiceover.services.azure import AzureService
 
-# from manim_voiceover.interfaces.pyttsx3 import PyTTSX3Service
-# from manim_voiceover.interfaces.gtts import GTTSService
-from manim_voiceover.services.stitcher import StitcherService
-
-# import pyttsx3
-
 code_style = code_styles.get_style_by_name("one-dark")
 
 
@@ -22,14 +16,6 @@ def construct(self):
                 style="newscast-casual",  # global_speed=1.15
             )
         )
-        dirname = os.path.dirname(os.path.abspath(__file__))
-        # self.set_speech_service(
-        #     StitcherService(dirname + "/voiceover_demo_recording.mp3")
-        # )
-
-        # self.set_speech_service(PyTTSX3Service(pyttsx3.init(), global_speed=1.15))
-        # self.set_speech_service(GTTSService())
-
         banner = ManimBanner().scale(0.5)
 
         with self.voiceover(text="Hey Manim Community!"):
@@ -294,7 +280,9 @@ def construct(self):
             text="Visit the GitHub repo to start using it in your project."
         ):
             self.play(
-                FadeIn(Tex(r"\texttt{https://github.com/ManimCommunity/manim-voiceover}"))
+                FadeIn(
+                    Tex(r"\texttt{https://github.com/ManimCommunity/manim-voiceover}")
+                )
             )
 
         self.wait(5)
diff --git a/manim_voiceover/defaults.py b/manim_voiceover/defaults.py
@@ -0,0 +1,4 @@
+from pathlib import Path
+
+DEFAULT_VOICEOVER_CACHE_DIR = "voiceovers"
+DEFAULT_VOICEOVER_CACHE_JSON_FILENAME = "cache.json"