Matamata (an acronym for "Matamata attempts to animate mouths, at times accurately") is a tool to automatically create lip-synced animations.
Matamata currently supports two methods of phoneme alignment, Allosaurus and Gentle. Allosaurus is easier to setup and it performs better in loud environments, however its alignment is often not as accurate. Gentle alignment requires Docker Desktop, which can be harder to install but does not require python and generally provides better alignment. Please install one (or both) of these options, Gentle is currently the recommended option. Keep in mind you can use either of these options by specifying --aligning_algorithm allosaurus | gentle
when running the program.
Currently Allosaurus is in a development state and is not necessarily usable for large projects.
Gentle
This will not work on Macs with apple silicon.
Install Docker Desktop for your operating system
Run docker pull lower quality/gentle
in your command prompt/terminal
Allosaurus
All allosaurus requires is python.
On Windows:
Download and install python3.9, make sure to select the option to add python3 to path during install. You can test to see if this worked by running python
in your terminal.
On Mac:
Install using Homebrew
brew install python3
On Ubuntu:
sudo apt install python3 python-is-python3
Also install the required pip packages:
pip3 install allosaurus
as well as pytorch
-
Install NodeJS 16+
- Make sure to include the optional add-ons
-
Install yarn and typescript
npm install --global yarn typescript
- Download the code using git or the button in the top right
git clone https://github.com/Matamata-Animator/Matamata-Core.git
- Open the folder In command prompt and install the dependencies
yarn
- Install Vosk model through the Vosk website or using the automatic tool. This is a 1.6 GB file and thus will take some time, please have patience.
yarn downloadModel
- Clone the repo
git clone https://github.com/AI-Spawn/Auto-Lip-Sync
cd Auto-Lip-Sync
- Install required packages
sudo apt install docker.io nodejs
- Install yarn and typescript
sudo npm install --global yarn typescript
- Open the folder in the terminal and install the dependencies
yarn
- Install Vosk model through the Vosk website or using the automatic tool. This is a 1.6 GB file and thus will take some time, please have patience.
yarn downloadModel
- Install the Node Version Manager (nvm)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
- Install Node 6+
nvm install 16
- Install Yarn and TypeScript
npm install -g yarn typescript
- Download the code using git or the button in the top right
git clone https://github.com/Matamata-Animator/Matamata-Core.git
- Open the folder in the terminal and install the dependencies
yarn
- Install Vosk model through the Vosk website or using the automatic tool. This is a 1.6 GB file and thus will take some time, please have patience. There is currently no progress bar implemented.
yarn downloadModel
Below is a barebones character file:
{
"mouthsPath": "defaults/mouths/",
"poses": {
"imagesFolder": "defaults/SampleCharacter/faces/",
"default": {
"image": "purple.png",
"x": 640,
"y": 400
},
"green": {
"image": "blue.png",
"x": 640,
"y": 400
}
}
}
mouthsPath
specifies the path to a folder containing the mouth images.
poses
contains two main elements. imagesFolder
specifies the path to the folder which contains the pose images. default
is a pose object. image
refers to the name of the image inside the imagesFolder. x
and y
are the coordinates where the mouth should be placed. More poses can be created with more pose objects, as is shown with the green
pose.
A more fleshed out character file could look like this:
{
"mouthsPath": "defaults/mouths/",
"poses": {
"imagesFolder": "defaults/SampleCharacter/faces/",
"default_scale": 2,
"default": {
"image": "purple.png",
"x": 640,
"y": 400,
"facingLeft": false,
"scale": 2
},
"green": {
"image": "green.png",
"x": 640,
"y": 400,
"facingLeft": false,
"scale": 1
}
},
"eyes": {
"imagesFolder": "defaults/SampleCharacter/eyes/",
"scale": 0.8,
"x": 640,
"y": 300,
"images": {
"angry": "angry.png",
"normal": "normal.png",
"sad": "sad.png"
}
}
}
default_scale
says how much the mouth should be scaled up or down. scale
is the same thing for a specific pose. In this case, the mouths for the default
pose with be 4x the image size, while the mouths for the green
pose will only be 2x the size.
eyes
specifies a "placeable part". The sample character pose images don't have eyes, as these are specified by placeable parts. Although this example has placeable eyes, you can have placeable pins, objects in the background, or even hats. The imagesFolder
specifies the path to the folder contains the images for the placeable part. scale
specifies how much the placeable part image should be scaled up or down. x
and y
specify the location on the pose where the part should be placed. images
contained key-value pairs where the key is the name of the part, and the value is the image name. This section shows angry, normal, and sad eye selections.
The timestamps file is composed of a list of pose changes along with how many milliseconds into the animation the pose should change. For instance, if you wanted to swap to the happy
pose after 3.5 seconds, the timestamps file will look like:
3500 happy
Additionally, you can change a placeable part by adding the type afterwards.
0 angry eyes
You can also remove a placeable part by using the name None
5000 None eyes
This covers the most important flags and arguments. For the complete list, go to Default Arguments.
Shortcut | Command | Required | Default | Type | Description |
---|---|---|---|---|---|
--a | --audio | * | str | The path to the audio file being animated | |
--aligning_algorithm | gentle | gentle | allosaurus | The aligning algorithm to be used. | ||
--t | --timestamps | str | The path to the file containing pose timestamps. | ||
--o | --output | "defaults/output.mp4" | str | The output of the program | |
--c | --character | "defaults/characters.json" | str | The list of character poses | |
--m | --mouths | "defaults/phonemes.json" | str | The mouth pack and phonemes list | |
--V | --verbose | 1 | int | Dump process outputs to the shell |
You can set custom default arguments by creating a file config.json
in the main folder. In this file, the key is the command and the value is what you want the new default to be. For instance, if you wanted to always be set to verbose mode 3, your file will be:
{
"verbose": 3
}
The command to create an animation is the same for all supported platforms
yarn animate --a audio.wav [optional arguments]
Do you use this project and want to see a new feature added? Open an issue with the tag feature request and say what you want.
Want to try your hand at writing code? Create a fork, upload your code, and make a pull request. Anything from fixing formatting/typos to entirely new features is welcome!
Don't know what to work on? Take a look at the issues page to see what improvements people want. Anything marked good first issue should be great for newcomers!