-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
1043ece
commit 565fa89
Showing
11 changed files
with
3,055 additions
and
2 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,176 @@ | ||
# text2gesture_cnn | ||
Text-to-Gesture Generation Model Using Convolutional Neural Network | ||
# Evaluation of Text-to-Gesture Generation Model Using Convolutional Neural Network | ||
|
||
This repository contains the code for the text-to-gesture generation model using CNN. | ||
|
||
The demonstration video of generated gestures is available at <https://youtu.be/JX4Gqy-Rmso>. | ||
|
||
## Requirements | ||
We used the [PyTorch](https://pytorch.org/) version 1.7.1 for neural network implementation. We tested the codes on the following environment: | ||
|
||
- Ubuntu 16.04 LTS | ||
- GPU: NVIDIA GeForce GTX 1080Ti | ||
- Python environment: anaconda3-2020.07 | ||
- [fasttext](https://fasttext.cc/) | ||
- cv2 (4.4.0) | ||
- ffmpeg | ||
|
||
## Preparation | ||
1. Our code uses the speech and gesture dataset provided by Ginosar et al. Download the Speech2Gesture dataset by following the instruction "Download specific speaker data" in <https://github.com/amirbar/speech2gesture/blob/master/data/dataset.md>. | ||
|
||
``` | ||
Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik, "Learning Individual Styles of Conversational Gesture," CVPR 2019. | ||
``` | ||
|
||
After downloading the Speech2Gesture dataset, your dataset folder should be like: | ||
``` | ||
Gestures | ||
├── frames_df_10_19_19.csv | ||
├── almaram | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
└── videos | ||
... | ||
└── shelly | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
└── videos | ||
``` | ||
2. Download the text dataset from [HERE](https://drive.google.com/file/d/1OjSJ-F9hoLOfecF5FwdCGG2Mp8fBPgGb/view?usp=sharing) and unarchive the zip file. | ||
3. Move the `words` directory in each speaker name's directory to the corresponding speaker's directory in your dataset directory. | ||
|
||
After this step, your dataset folder should be like: | ||
``` | ||
Gestures | ||
├── frames_df_10_19_19.csv | ||
├── almaram | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
├── videos | ||
└── words | ||
... | ||
└── shelly | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
├── videos | ||
└── words | ||
``` | ||
Note that the word data for speaker Jon is very little. Therefore, it should not use for model training. | ||
|
||
4. Set up the fasttext by following the instruction [HERE](https://fasttext.cc/docs/en/support.html). Download the pre-trained model file (wiki-news-300d-1M-subword.bin) of fasttext from [HERE](https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip). | ||
|
||
## Create training and test data | ||
* Run the script as | ||
```shell | ||
python dataset.py --base_path <BASE_PATH> --speaker <SPEAKER_NAME> --wordvec_file <W2V_FILE> --dataset_type <DATASET_TYPE> --frames <FRAMES> | ||
``` | ||
|
||
* Options | ||
* <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`) | ||
* <SPEAKER_NAME>: Speaker name (directory name of speaker) (e.g., `almaram`, `oliver`) | ||
* <W2V_FILE>: Path to the pre-trained model of `fasttext` (e.g., `/path_to_your_fasttext_dir/wiki-news-300d-1M-subword.bin/`) | ||
* <DATASET_TYPE>: Dataset type (`train` or `test`) | ||
* <FRAMES>: Number of frame for training data (we used 64 for training and 192 for test data) | ||
|
||
* Example (Create Oliver's data) | ||
```shell | ||
# training data | ||
python dataset.py --base_path <BASE_PATH> --speaker oliver --wordvec_file <W2V_FILE> --dataset_type train --frames 64 | ||
|
||
# test data | ||
python dataset.py --base_path <BASE_PATH> --speaker oliver --wordvec_file <W2V_FILE> --dataset_type test --frames 192 | ||
``` | ||
|
||
After run the script, the direcrories containing the training or test data are created in your dataset folder. After this step, your dataset folder should be like: | ||
``` | ||
Gestures | ||
├── frames_df_10_19_19.csv | ||
├── almaram | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
├── test-192 | ||
├── train-64 | ||
├── videos | ||
└── words | ||
... | ||
├── shelly | ||
├── frames | ||
├── keypoints_all | ||
├── keypoints_simple | ||
├── test-192 | ||
├── train-64 | ||
├── videos | ||
└── words | ||
``` | ||
|
||
## Model training | ||
* Run the script as | ||
```shell | ||
python train.py --outdir_path <OUT_DIR> --speaker <SPEAKER_NAME> --gpu_num <GPU> --base_path <BASE_PATH> --train_dir <TRAIN_DIR> | ||
``` | ||
|
||
* Options | ||
* <OUT_DIR>: Directory for saving training result (e.g., `./out_training/`) | ||
* <SPEAKER_NAME>: Speaker name (directory name of speaker) (e.g., `almaram`, `oliver`) | ||
* <GPU>: GPU ID | ||
* <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`) | ||
* <TRAIN_DIR>: Directory name containing training data (e.g., `train-64`) | ||
|
||
The experimental settings (e.g., number of epochs, loss function) can change by specifying the argument. Please see the script file of `train.py` for the details. | ||
|
||
* Example (Training using Oliver's data) | ||
```shell | ||
python train.py --outdir_path ./out_training/ --speaker oliver --gpu_num 0 --base_path <BASE_PATH> --train_dir train-64 | ||
``` | ||
The resulting files will be created in `./out_training/oliver_YYYYMMDD-AAAAAA/`. | ||
|
||
## Evaluation | ||
* Predict the gesture motion for test data using a trained model | ||
* Run the script as | ||
```shell | ||
python test.py --base_path <BASE_PATH> --test_speaker <TEST_SPEAKER> --test_dir <TEST_DIR> --model_dir <MODEL_DIR> --model_path <MODEL_PATH> --outdir_path <OUT_DIR> | ||
``` | ||
|
||
* Options | ||
* <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`) | ||
* <TEST_SPEAKER>: Speaker name for testing (directory name of test speaker) (e.g., `almaram`, `oliver`) | ||
* <TEST_DIR>: Directory name containing test data (e.g., `test-192`) | ||
* <MODEL_DIR>: Directory name of trained model (e.g., `oliver_YYYYMMDD-AAAAAA`) | ||
* <MODEL_PATH>: Path to training result (e.g., `./out_training/`) | ||
* <OUT_DIR>: Directory for saving test result (e.g., `./out_test/`) | ||
|
||
* Example (Predict the Oliver's test data using Oliver's trained model) | ||
```shell | ||
python test.py --base_path <BASE_PATH> --test_speaker oliver --test_dir test-192 --model_dir oliver_YYYYMMDD-AAAAAA --model_path ./out_training/ --outdir_path ./out_test/ | ||
``` | ||
The resulting files (`.npy` files for predicted motion) are created in `./out_test/oliver_by_oliver_YYYYMMDD-AAAAAA_test-192/`. | ||
|
||
* Example (Predict the Rock's test data using Oliver's trained model) | ||
```shell | ||
python test.py --base_path <BASE_PATH> --test_speaker rock --test_dir test-192 --model_dir oliver_YYYYMMDD-AAAAAA --model_path ./out_training/ --outdir_path ./out_test/ | ||
``` | ||
The resulting files (`.npy` files for predicted motion) will be created in `./out_test/rock_by_oliver_YYYYMMDD-AAAAAA_test-192/`. | ||
|
||
## Visualization | ||
* Create gesture movie files | ||
* Run the script as | ||
```shell | ||
python make_gesture_video.py --base_path <BASE_PATH> --test_out_path <TEST_OUT_PATH> --test_out_dir <TEST_OUT_DIR> --video_out_path <VIDEO_OUT_PATH> | ||
``` | ||
|
||
* Options | ||
* <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`) | ||
* <TEST_OUT_PATH>: Directory path of test output (e.g., `./out_test/`) | ||
* <TEST_OUT_DIR>: Directory name of output gestures (e.g., `oliver_by_oliver_YYYYMMDD-AAAAAA_test-192`) | ||
* <VIDEO_OUT_PATH>: Directory path of output videos (e.g., `./out_video/`) | ||
|
||
* Example (When using `XXXXX`) | ||
```shell | ||
python make_gesture_video.py --base_path <BASE_PATH> --test_out_path ./out_test/ --test_out_dir oliver_by_oliver_YYYYMMDD-AAAAAA_test-192 --video_out_path ./out_video/ | ||
``` | ||
|
||
The gesture videos (side-by-side video of ground truth and text-to-gesture) will be created in `./out_test/text2gesture/oliver_by_oliver_YYYYMMDD-AAAAAA_test-192/`. The left side gesture is ground truth, and the right side gesture is one generated by the text-to-gesture generation model. Also, the original videos of test intervals will be created in `./out_test/original/oliver/`. |
Oops, something went wrong.