add files

GestureGeneration · Oct 12, 2021 · 565fa89 · 565fa89
1 parent 1043ece
commit 565fa89
Show file tree

Hide file tree

Showing 11 changed files with 3,055 additions and 2 deletions.
diff --git a/.gitignore b/.gitignore
diff --git a/README.md b/README.md
@@ -1,2 +1,176 @@
-# text2gesture_cnn
-Text-to-Gesture Generation Model Using Convolutional Neural Network
+# Evaluation of Text-to-Gesture Generation Model Using Convolutional Neural Network
+
+This repository contains the code for the text-to-gesture generation model using CNN.
+
+The demonstration video of generated gestures is available at <https://youtu.be/JX4Gqy-Rmso>.
+
+## Requirements
+We used the [PyTorch](https://pytorch.org/) version 1.7.1 for neural network implementation. We tested the codes on the following environment:
+
+- Ubuntu 16.04 LTS
+- GPU: NVIDIA GeForce GTX 1080Ti
+- Python environment: anaconda3-2020.07
+    - [fasttext](https://fasttext.cc/)
+    - cv2 (4.4.0)
+- ffmpeg
+
+## Preparation
+1. Our code uses the speech and gesture dataset provided by Ginosar et al. Download the Speech2Gesture dataset by following the instruction "Download specific speaker data" in <https://github.com/amirbar/speech2gesture/blob/master/data/dataset.md>.
+
+```
+Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, and Jitendra Malik, "Learning Individual Styles of Conversational Gesture," CVPR 2019.
+```
+
+After downloading the Speech2Gesture dataset, your dataset folder should be like:
+```
+Gestures
+├── frames_df_10_19_19.csv
+├── almaram
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    └── videos
+...
+└── shelly
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    └── videos
+```
+2. Download the text dataset from [HERE](https://drive.google.com/file/d/1OjSJ-F9hoLOfecF5FwdCGG2Mp8fBPgGb/view?usp=sharing) and unarchive the zip file.
+3. Move the `words` directory in each speaker name's directory to the corresponding speaker's directory in your dataset directory.
+
+After this step, your dataset folder should be like:
+```
+Gestures
+├── frames_df_10_19_19.csv
+├── almaram
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    ├── videos
+    └── words
+...
+└── shelly
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    ├── videos
+    └── words
+```
+Note that the word data for speaker Jon is very little. Therefore, it should not use for model training.
+
+4. Set up the fasttext by following the instruction [HERE](https://fasttext.cc/docs/en/support.html). Download the pre-trained model file (wiki-news-300d-1M-subword.bin) of fasttext from [HERE](https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.bin.zip).
+
+## Create training and test data
+* Run the script as  
+```shell
+python dataset.py --base_path <BASE_PATH> --speaker <SPEAKER_NAME> --wordvec_file <W2V_FILE> --dataset_type <DATASET_TYPE> --frames <FRAMES>
+```
+
+* Options
+    * <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`)
+    * <SPEAKER_NAME>: Speaker name (directory name of speaker) (e.g., `almaram`, `oliver`)
+    * <W2V_FILE>: Path to the pre-trained model of `fasttext` (e.g., `/path_to_your_fasttext_dir/wiki-news-300d-1M-subword.bin/`)
+    * <DATASET_TYPE>: Dataset type (`train` or `test`)
+    * <FRAMES>: Number of frame for training data (we used 64 for training and 192 for test data)
+
+* Example (Create Oliver's data)
+```shell
+# training data
+python dataset.py --base_path <BASE_PATH> --speaker oliver --wordvec_file <W2V_FILE> --dataset_type train --frames 64
+
+# test data
+python dataset.py --base_path <BASE_PATH> --speaker oliver --wordvec_file <W2V_FILE> --dataset_type test --frames 192
+```
+
+After run the script, the direcrories containing the training or test data are created in your dataset folder. After this step, your dataset folder should be like:
+```
+Gestures
+├── frames_df_10_19_19.csv
+├── almaram
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    ├── test-192
+    ├── train-64
+    ├── videos
+    └── words
+...
+├── shelly
+    ├── frames
+    ├── keypoints_all
+    ├── keypoints_simple
+    ├── test-192
+    ├── train-64
+    ├── videos
+    └── words
+```
+
+## Model training
+* Run the script as
+```shell
+python train.py --outdir_path <OUT_DIR> --speaker <SPEAKER_NAME> --gpu_num <GPU> --base_path <BASE_PATH> --train_dir <TRAIN_DIR>
+```
+
+* Options
+    * <OUT_DIR>: Directory for saving training result (e.g., `./out_training/`)
+    * <SPEAKER_NAME>: Speaker name (directory name of speaker) (e.g., `almaram`, `oliver`)
+    * <GPU>: GPU ID
+    * <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`)
+    * <TRAIN_DIR>: Directory name containing training data (e.g., `train-64`)
+
+The experimental settings (e.g., number of epochs, loss function) can change by specifying the argument. Please see the script file of `train.py` for the details.
+
+* Example (Training using Oliver's data)
+```shell
+python train.py --outdir_path ./out_training/ --speaker oliver --gpu_num 0 --base_path <BASE_PATH> --train_dir train-64
+```
+The resulting files will be created in `./out_training/oliver_YYYYMMDD-AAAAAA/`.
+
+## Evaluation
+* Predict the gesture motion for test data using a trained model
+* Run the script as
+```shell
+python test.py --base_path <BASE_PATH> --test_speaker <TEST_SPEAKER> --test_dir <TEST_DIR> --model_dir <MODEL_DIR> --model_path <MODEL_PATH> --outdir_path <OUT_DIR>
+```
+
+* Options
+    * <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`)
+    * <TEST_SPEAKER>: Speaker name for testing (directory name of test speaker) (e.g., `almaram`, `oliver`)
+    * <TEST_DIR>: Directory name containing test data (e.g., `test-192`)
+    * <MODEL_DIR>: Directory name of trained model (e.g., `oliver_YYYYMMDD-AAAAAA`)
+    * <MODEL_PATH>: Path to training result (e.g., `./out_training/`)
+    * <OUT_DIR>: Directory for saving test result (e.g., `./out_test/`)
+
+* Example (Predict the Oliver's test data using Oliver's trained model)
+```shell
+python test.py --base_path <BASE_PATH> --test_speaker oliver --test_dir test-192 --model_dir oliver_YYYYMMDD-AAAAAA --model_path ./out_training/ --outdir_path ./out_test/
+```
+The resulting files (`.npy` files for predicted motion) are created in `./out_test/oliver_by_oliver_YYYYMMDD-AAAAAA_test-192/`.
+
+* Example (Predict the Rock's test data using Oliver's trained model)
+```shell
+python test.py --base_path <BASE_PATH> --test_speaker rock --test_dir test-192 --model_dir oliver_YYYYMMDD-AAAAAA --model_path ./out_training/ --outdir_path ./out_test/
+```
+The resulting files (`.npy` files for predicted motion) will be created in `./out_test/rock_by_oliver_YYYYMMDD-AAAAAA_test-192/`.
+
+## Visualization
+* Create gesture movie files
+* Run the script as
+```shell
+python make_gesture_video.py --base_path <BASE_PATH> --test_out_path <TEST_OUT_PATH> --test_out_dir <TEST_OUT_DIR> --video_out_path <VIDEO_OUT_PATH>
+```
+
+* Options
+    * <BASE_PATH>: Path to dataset folder (e.g., `/path_to_your_dataset/Gestures/`)
+    * <TEST_OUT_PATH>: Directory path of test output (e.g., `./out_test/`)
+    * <TEST_OUT_DIR>: Directory name of output gestures (e.g., `oliver_by_oliver_YYYYMMDD-AAAAAA_test-192`)
+    * <VIDEO_OUT_PATH>: Directory path of output videos (e.g., `./out_video/`)
+
+* Example (When using `XXXXX`)
+```shell
+python make_gesture_video.py --base_path <BASE_PATH> --test_out_path ./out_test/ --test_out_dir oliver_by_oliver_YYYYMMDD-AAAAAA_test-192 --video_out_path ./out_video/
+```
+
+The gesture videos (side-by-side video of ground truth and text-to-gesture) will be created in `./out_test/text2gesture/oliver_by_oliver_YYYYMMDD-AAAAAA_test-192/`. The left side gesture is ground truth, and the right side gesture is one generated by the text-to-gesture generation model. Also, the original videos of test intervals will be created in `./out_test/original/oliver/`.