This tool is not meant to be used for real-time landmark and face detection. Other tools such as OpenPose, OpenCV, and optimized Dlib already allow you to do so.
This tool has been built to meet the needs I have at the moment of creating it for other and more complex projects. FaceTool is meant to be used with single person videos exclusively. It isn't perfect and will be updated as needed.
FaceTool is a Python tool for face cropped videos. It allows you to infer face region, landmarks, segmentation mask and ink like contours from single person videos. FaceTool has been implemented with batch inference in mind allowing to treat the videos faster. It can be used to generate a dataset for training Deep Learning models and such.
The installation is pretty straightforward. All you need is a Python3 environment with Pip installed. Installation is done by calling the following command (may require sudo):
$ (sudo) python3 setup.py install
After installation, the FaceTool can perform four actions: Create annotation CSV file for a given video, Visualize annotation for a given video and its annotation file, produce Human Masking as an MP4 clip, and also Ink like Contouring as an MP4 clip.
The speed of processing will highly benefit from using an enabled Nvidia GPU with Cuda and CudNN support installed. Make sure the number of batches can fit into your device's memory.
The annotation command is in the following format:
$ python3 -m facetool annotate --help
usage: __main__.py annotate [-h] --video VIDEO --annotations ANNOTATIONS
--dbatch_size DBATCH_SIZE --lbatch_size LBATCH_SIZE --n_process N_PROCESS
[--size SIZE SIZE] [-d DEVICE]
optional arguments:
-h, --help show this help message and exit
--video VIDEO video path to be annotated (mp4 by preference `video.mp4`)
--annotations ANNOTATIONS annotations saving path (save to csv `annotations.csv`)
--dbatch_size DBATCH_SIZE batch_size for the detector inference
--lbatch_size LBATCH_SIZE batch_size for the landmark video loader
--n_process N_PROCESS number of threads used for lanrmarking
--size SIZE SIZE resize the video for the detector
-d DEVICE, --device DEVICE device to run detector on, `cpu` or `cuda`
The annotation file is in CSV
format and is meant to be loaded with the Pandas library. The resulting file spec is the following:
frame_idx | box_x | box_y | box_w | box_h | landmark_1_x | landmark_1_y | ... | landmark_68_x | landmark_68_y |
---|
The naming convention is explicit. frame_idx
corresponds to the frame id in the video. box_x
, box_y
, box_w
, box_h
correspond to the face detection 2D box coordinates and size. And landmark_i_x
, landmark_i_y
correspond to the ith
(68 in total) facial regressed 2D landmarks coordinates.
The visualization command is in the following format:
$ python3 -m facetool visualize --help
usage: __main__.py visualize [-h] --video VIDEO --annotations ANNOTATIONS
[--save SAVE] [--size SIZE SIZE]
optional arguments:
-h, --help show this help message and exit
--video VIDEO video path (mp4 by preference `video.mp4`)
--annotations ANNOTATIONS annotations path (csv `annotations.csv`)
--save SAVE visualization saving path (gif `visualization.gif`)
--size SIZE SIZE resize the video to save gif
The visualization will display the processed frames as the example displayed at the top of this page shows. If you decide to save the result into a GIF
, it will produce one with 15 FPS
. Be careful to used small videos for this usage as the visualization is not optimized to load the video frames in batches compared to the rest of the library. It makes sense as GIF
needs to be small.
The segmentation command is in the following format:
$ python3 -m facetool mask --help
usage: __main__.py mask [-h] --video VIDEO --mask MASK --batch_size BATCH_SIZE
[-d DEVICE]
optional arguments:
-h, --help show this help message and exit
--video VIDEO video path to be masked (mp4 by preference `video.mp4`)
--mask MASK mask video output path (mp4 by preference `mask.mp4`)
--batch_size BATCH_SIZE batch_size for the segmentation model
-d DEVICE, --device DEVICE device to run the segmentation model on, `cpu` or `cuda`
The segmentation outputs an MP4
black and white clip of the resulting masks. White represents the person and black the background. The resulting clip can later be used to crop the entire person out of the video. The masks are not perfect but can be usefull for certain usage. Gaussian Blur is used to blur the edges.
The produced mask video does not include landmarks and box annoations as shown in the examples above. The corresponsing mask video is the black and white only.
The contour command is in the following format:
$ python3 -m facetool xdog --help
usage: __main__.py xdog [-h] --video VIDEO --contour CONTOUR --batch_size BATCH_SIZE
[-d DEVICE] [--sigma1 SIGMA1] [--sigma2 SIGMA2] [--sharpen SHARPEN] [--phi PHI] [--eps EPS]
optional arguments:
-h, --help show this help message and exit
--video VIDEO video path to be contoured (mp4 by preference `video.mp4`)
--contour CONTOUR contour video output path (mp4 by preference `contour.mp4`)
--batch_size BATCH_SIZE batch_size for the segmentation model
-d DEVICE, --device DEVICE device to run the segmentation model on, `cpu` or `cuda`
--sigma1 SIGMA1 sigma of the first gaussian blur filter
--sigma2 SIGMA2 sigma of the second gaussian blur filter
--sharpen SHARPEN sharpens the gaussians before computing difference
--phi PHI phi parameter for soft thresholding
--eps EPS epsilon parameter for soft thresholding
The contour outputs an MP4
black and white clip of the resulting contour. Ink lines ar represented in Black. The result is obtain using xDoG (eXtended Difference of Gaussians) with default parameters provided in the command. Parameters can be adjusted to obtain different results.
MTCNN - Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao, "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks", IEEE Signal Processing Letters 2016 - Code
Dlib - Davis E. King, "Dlib-ml: A Machine Learning Toolkit", Journal of Machine Learning Research 2009 - Code
UNet - Olaf Ronneberger, Philipp Fischer, Thomas Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation", MICCAI 2015 - Code
xDoG - Holger Winnemoller, Jan Eric Kyprianidisb, Sven C. Olsen, "XDoG: An eXtended Difference-of-Gaussians Compendium Including Advanced Image Stylization", Computers & Graphics, Vol. 36, Issue 6, 2012, pp. 720–753 - Code