Skip to content

Commit

Permalink
Add RecorderService (#23)
Browse files Browse the repository at this point in the history
* Added RecorderService, PyAudio, Whisper

* wip

* Added silence trimming

* Added word boundary detection with whisper STT

* Changed voiceover dir structure

* wip

* Cosmetics

* Updated docs

* Minor
  • Loading branch information
osolmaz authored Nov 27, 2022
1 parent c3e0839 commit a45d079
Show file tree
Hide file tree
Showing 28 changed files with 1,614 additions and 839 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,19 @@

Manim Voiceover is a [Manim](https://manim.community) plugin for all things voiceover:

- Add voiceovers to Manim videos _directly in Python_ without having to use a video editor.
- Develop an animation with an auto-generated AI voice without having to re-record and re-sync the audio.
- Record a voiceover and have it stitched back onto the video instantly. (Note that this is not the same as AI voice cloning)
- Add voiceovers to Manim videos *directly in Python* without having to use a video editor.
- Record voiceovers with your microphone during rendering with a simple command line interface.
- Develop animations with auto-generated AI voices from various free and proprietary services.
- Per-word timing of animations, i.e. trigger animations at specific words in the voiceover, even for the recordings. This works thanks to [OpenAI Whisper](https://github.com/openai/whisper).

Here is a demo:

https://user-images.githubusercontent.com/2453968/198145393-6a1bd709-4441-4821-8541-45d5f5e25be7.mp4

Currently supported TTS services:
Currently supported TTS services (aside from the CLI that allows you to records your own voice):

- [Azure Text to Speech](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (Recommended)
- [Azure Text to Speech](https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/) (Recommended for AI voices)
- [Coqui TTS](https://github.com/coqui-ai/TTS/)
- [gTTS](https://github.com/pndurette/gTTS/)
- [pyttsx3](https://github.com/nateshmbhat/pyttsx3)

Expand Down
4 changes: 4 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ Speech services
:members:
:show-inheritance:

.. automodule:: manim_voiceover.services.recorder
:members:
:show-inheritance:

.. automodule:: manim_voiceover.services.azure
:members:
:show-inheritance:
Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ Manim Voiceover
`Manim Voiceover <https://voiceover.manim.community>`__ is a `Manim <https://manim.community>`__ plugin for all things voiceover:

- Add voiceovers to Manim videos *directly in Python* without having to use a video editor.
- Develop an animation with an auto-generated AI voice without having to re-record and re-sync the audio.
- Record a voiceover and have it stitched back onto the video instantly. (Note that this is not the same as AI voice cloning)
- Record voiceovers with your microphone during rendering with a simple command line interface (see :py:class:`~manim_voiceover.services.recorder.RecorderService`).
- Develop animations with auto-generated AI voices from various free and proprietary services.
- Per-word timing of animations, i.e. trigger animations at specific words in the voiceover, even for the recordings. This works thanks to `OpenAI Whisper <https://github.com/openai/whisper>`__.

A demo:

Expand All @@ -24,5 +25,4 @@ A demo:
quickstart
services
examples
.. changelog
api
32 changes: 31 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,40 @@ Translate API and therefore needs an internet connection to work. If it
throws an error, there might be a problem with your internet connection
or the Google Translate API.

Installing PortAudio
~~~~~~~~~~~~~~~~~~~~

Manim Voiceover lets you record voiceovers during rendering using `PyAudio <https://people.csail.mit.edu/hubert/pyaudio/>`__.
PyAudio depends on `PortAudio <http://www.portaudio.com/>`__ which needs to be installed separately.

On Debian based distros:

.. code:: sh
sudo apt install portaudio19-dev
sudo pip install pyaudio
# Or install from apt globally:
sudo apt install python3-pyaudio
On macOS, you can install it using `Homebrew <https://brew.sh/>`__:

.. code:: sh
brew install portaudio
pip install pyaudio
On Windows, PortAudio should come prepackaged with the binaries, so just install PyAudio with pip:

.. code:: sh
python -m pip install pyaudio
For more information, see the `PyAudio documentation <https://people.csail.mit.edu/hubert/pyaudio/#downloads>`__.

Installing SoX
~~~~~~~~~~~~~~

``manim-voiceover`` can make the output from speech synthesizers faster
Manim Voiceover can make the output from speech synthesizers faster
or slower using `SoX <http://sox.sourceforge.net/>`__ (version 14.4.2 or
higher is required).

Expand Down
42 changes: 42 additions & 0 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,46 @@ Bookmarks allow you to trigger an animation at a specific word in the voiceover.

With bookmarks, you can time your animations much more precisely. See the `bookmark example <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/bookmark-example.py>`__ and `Approximating Tau <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/approximating-tau.py>`__ for more examples.

Record your own voiceover
*************************

Manim Voiceover can record your voiceover directly from the command line. We recommend the following workflow:

1. Develop your animation with one of the text-to-speech engines, e.g. :py:class:`services.gtts.GTTSService`:

.. code:: py
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.gtts import GTTSService
class MyAwesomeScene(VoiceoverScene):
def construct(self):
self.set_speech_service(GTTSService())
with self.voiceover(text="This circle is drawn as I speak.") as tracker:
self.play(Create(circle))
2. When you're happy with the animation, switch the service with :py:class:`services.recorder.RecorderService` to record your own voiceover:

.. code:: py
from manim_voiceover import VoiceoverScene
# from manim_voiceover.services.gtts import GTTSService
from manim_voiceover.services.recorder import RecorderService
class MyAwesomeScene(VoiceoverScene):
def construct(self):
# self.set_speech_service(GTTSService())
self.set_speech_service(RecorderService())
with self.voiceover(text="This circle is drawn as I speak.") as tracker:
self.play(Create(circle))
3. Render the scene the same way you would normally do:

.. code:: sh
manim -pql my_awesome_scene.py --disable_caching
This will instruct you in the terminal step by step what to do to record your voiceover.
19 changes: 19 additions & 0 deletions docs/source/services.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ Manim Voiceover defines the :py:class:`~~base.SpeechService` class for adding ne
- Can run offline?
- Paid / requires an account?
- Notes
* - :py:class:`~recorder.RecorderService`
- N/A
- N/A
- N/A
- This is a utility class to record your own voiceovers with a microphone.
* - :py:class:`~azure.AzureService`
- Very good, human-like
- No
Expand All @@ -45,6 +50,20 @@ Manim Voiceover defines the :py:class:`~~base.SpeechService` class for adding ne

It is on our roadmap to provide a high quality TTS engine that runs locally for free. If you have any suggestions, please let us know in the `Discord server <https://manim.community/discord>`__.

:py:class:`~recorder.RecorderService`
*************************************

This is not a speech synthesizer but a utility class to record your own voiceovers with a microphone. It provides a command line interface to record voiceovers during rendering.

Install Manim Voiceover with the ``recorder`` extra in order to use :py:class:`~recorder.RecorderService`:

.. code:: sh
pip install "manim-voiceover[recorder]"
Refer to the `example usage <https://github.com/ManimCommunity/manim-voiceover/blob/main/examples/recorder-example.py>`__ to get started.


:py:class:`~azure.AzureService`
*******************************

Expand Down
19 changes: 10 additions & 9 deletions examples/bookmark-example.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,20 @@
from manim import *
from manim_voiceover import VoiceoverScene

from manim_voiceover.services.coqui import CoquiService
# from manim_voiceover.services.azure import AzureService
# from manim_voiceover.services.coqui import CoquiService

from manim_voiceover.services.azure import AzureService


class BookmarkExample(VoiceoverScene):
def construct(self):
self.set_speech_service(CoquiService())
# self.set_speech_service(
# AzureService(
# voice="en-US-AriaNeural",
# style="newscast-casual",
# )
# )
# self.set_speech_service(CoquiService())
self.set_speech_service(
AzureService(
voice="en-US-AriaNeural",
style="newscast-casual",
)
)

blist = BulletedList(
"Trigger animations", "At any word", "Bookmarks", font_size=64
Expand Down
4 changes: 3 additions & 1 deletion examples/coqui-example.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ def construct(self):
with self.voiceover(text="Now, let's transform it into a square.") as tracker:
self.play(Transform(circle, square), run_time=tracker.duration)

with self.voiceover(text="This is a very very very very very very very very very very very very very very very very very long sentence."):
with self.voiceover(
text="This is a very very very very very very very very very very very very very very very very very long sentence."
):
pass

with self.voiceover(text="Thank you for watching."):
Expand Down
25 changes: 25 additions & 0 deletions examples/recorder-example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from manim import *
from manim_voiceover import VoiceoverScene
from manim_voiceover.services.recorder import RecorderService


class RecorderExample(VoiceoverScene):
def construct(self):
self.set_speech_service(RecorderService(silence_threshold=-40.0))

circle = Circle()
square = Square().shift(2 * RIGHT)

with self.voiceover(text="This circle is drawn as I speak.") as tracker:
self.play(Create(circle), run_time=tracker.duration)

with self.voiceover(text="Let's shift it to the left 2 units.") as tracker:
self.play(circle.animate.shift(2 * LEFT), run_time=tracker.duration)

with self.voiceover(text="Now, let's transform it into a square.") as tracker:
self.play(Transform(circle, square), run_time=tracker.duration)

with self.voiceover(text="Thank you for watching."):
self.play(Uncreate(circle))

self.wait()
18 changes: 3 additions & 15 deletions examples/voiceover-demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,6 @@

from manim_voiceover.services.azure import AzureService

# from manim_voiceover.interfaces.pyttsx3 import PyTTSX3Service
# from manim_voiceover.interfaces.gtts import GTTSService
from manim_voiceover.services.stitcher import StitcherService

# import pyttsx3

code_style = code_styles.get_style_by_name("one-dark")


Expand All @@ -22,14 +16,6 @@ def construct(self):
style="newscast-casual", # global_speed=1.15
)
)
dirname = os.path.dirname(os.path.abspath(__file__))
# self.set_speech_service(
# StitcherService(dirname + "/voiceover_demo_recording.mp3")
# )

# self.set_speech_service(PyTTSX3Service(pyttsx3.init(), global_speed=1.15))
# self.set_speech_service(GTTSService())

banner = ManimBanner().scale(0.5)

with self.voiceover(text="Hey Manim Community!"):
Expand Down Expand Up @@ -294,7 +280,9 @@ def construct(self):
text="Visit the GitHub repo to start using it in your project."
):
self.play(
FadeIn(Tex(r"\texttt{https://github.com/ManimCommunity/manim-voiceover}"))
FadeIn(
Tex(r"\texttt{https://github.com/ManimCommunity/manim-voiceover}")
)
)

self.wait(5)
4 changes: 4 additions & 0 deletions manim_voiceover/defaults.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from pathlib import Path

DEFAULT_VOICEOVER_CACHE_DIR = "voiceovers"
DEFAULT_VOICEOVER_CACHE_JSON_FILENAME = "cache.json"
Loading

0 comments on commit a45d079

Please sign in to comment.