The fdet
is a ready-to-use implementation of deep learning face detectors with landkmarks.
You can use it directly in your code, as a python library:
>>> from fdet import io, RetinaFace
>>> detector = RetinaFace(backbone='RESNET50')
>>> image = io.read_as_rgb('path_to_image.jpg')
>>> detector.detect(image)
[{'box': [511, 47, 35, 45], 'confidence': 0.9999996423721313,
'keypoints': {'left_eye': [517, 70], 'right_eye': [530, 65], 'nose': [520, 77],
'mouth_left': [522, 87], 'mouth_right': [531, 83]}}]
Or through command-line application:
fdet retinaface -b RESNET50 -i path_to_image.jpg -o detections.json --gpu 1
Currently, there are two different detectors available on FDet:
- MTCNN - Joint face detection and alignment using multitask cascaded convolutional networks [zhang:2016]
- RetinaFace - Single-stage dense face localisation in the wild. [deng:2019]. You can use it with two different backbones:
- MobileNet: Fast and light-weighted model (achieves high FPS)
- Resnet50: A medium-size model for better results, but slower.
Despite the availability of different implementations of these algorithms, there are some disadvantages we found when using them. So we create this project to offer the following features, in one package:
- ⭐ Real-time face detection;
- ⭐ Support for batch detection (useful for fast detection in multiple images and videos);
- ⭐ Ease of use through python library or command-line tool;
- ⭐ Provide a unified interface to assign 'CPU' or 'GPU' devices;
- ⭐ Multiple GPU's support;
- ⭐ Automatic on-demand model weights download;
- ⭐ Compatible with Windows, Linux, and macOS systems.
-
You need to install PyTorch first (if you have a GPU, install PyTorch with CUDA support).
-
Then
fdet
can be installed through pip:
pip install fdet
Simply and fast usage through command-line tool.
The fdet
command-line tool has two sub-commands, on for each available detector: fdet mtcnn
and fdet retinaface
.
For a detailed list of available options type: fdet mtcnn --help
or fdet retinaface --help
, according to the desired detector.
This options are mutually exclusive
-i, --image FILE
: Image to detect. You can specify multiple images (-i img1.jpg -i img2.jpg
)-v, --video FILE
: Video file to detect. Only one video can be specified at a time.-l, --list FILE
: Text file containing a list of images (absolute paths) to detect.-d, --dir DIRECTORY
: The path of a directory containing images to detect. Ignores files that are not images.
-o, --output FILE
: Path to the output json file containing the detections.-s, --save-frimes DIRECTORY
(Optional): If specified, folder to save the output images with the detected faces drawn. Be careful when using this option with video input, as it will save all frames of the video.p, --print
(Optional): If specified,, print the detections to the console screen.-q, --quiet
(Optional): Do not display progress bar or any results.
--no-cuda
: Disables the CUDA utilization. When CUDA is not supported, it is automatically disabled.-g, --GPU INT
(Optional): When CUDA is supported, specifies which GPU to use. If not set, all available GPUs will be used.-bs, --batch-size INT
(Optional): The size of the detection batch (providing considerable speed-up) [default: 1]. This option only works for multiple images when they are the same size.
Defining the batch size is a complex task because it depends on the available memory in the system. We recommend performing small preliminary tests to find a suitable value.
If you want to use fdet
from python, just import it,
from fdet import MTCNN, RetinaFace
and instantiate your desired detector, with its respective parameters:
-
MTCNN(thresholds, nms_thresholds, min_face_size, cuda_enable, cuda_devices)
thresholds
(tuple, optional): The thresholds fo each MTCNN step [default: (0.6, 0.7, 0.8)]nms_thresholds
(tuple, optional): The NMS thresholds fo each MTCNN step [default: (0.7, 0.7, 0.7)]min_face_size
(float, optional): The minimum size of the face to detect, in pixels [default: 20.0].cuda_enable
(bool, optional): Indicates wheter CUDA, if available, should be used or not. If False, uses only CPU processing [default: True].cuda_devices
(list, optional): List of CUDA GPUs to be used. If None, uses all avaliable GPUs [default: None]. Ifcuda_enable
is False, this parameter is ignored.
-
RetinaFace(backbone, threshold, nms_threshold, max_face_size, cuda_enable, cuda_devices)
backbone
(str): The backbone model ['RESNET50'
or'MOBILENET'
].threshold
(tuple, optional): The detection threshold [default: 0.8]nms_threshold
(tuple, optional): The NMS threshold [default: 0.4]max_face_size
(int, optional): The maximum size of the face to detect, in pixels [default: 1000].cuda_enable
(bool, optional): Indicates wheter CUDA, if available, should be used or not. If False, uses only CPU processing. [default: True].cuda_devices
(list, optional): List of CUDA GPUs to be used. If None, uses all avaliable GPUs. [default: None]. Ifcuda_enable
is False, this parameter is ignored.
To perform detection you can simply use the following methods provided by the classes:
detect(image: np.ndarray)
: Single-image detection (example).
batch_detect(image: np.ndarray)
: Performs face detection on image batches, typically providing considerable speed-up (example).
For each processed image, the detector returns a list of dict
objects, which in turn represent the detected faces. The dict
contains three main keys, described below.
[
{'box': [511, 47, 35, 45], 'confidence': 0.9999996423721313,
'keypoints': {'left_eye': [517, 70], 'right_eye': [530, 65], 'nose': [520, 77],
'mouth_left': [522, 87], 'mouth_right': [531, 83]}}
]
'box'
: The bounding box formatted as a list[x, y, width, height]
;'confidence'
: The probability for a bounding box to be matching a face;'keypoints'
: The five landmarks formatted into adict
with the keys'left_eye'
,'right_eye'
,'nose'
,'mouth_left'
,'mouth_right'
. Each keypoint is identified by a pixel position[x, y]
.
The
batch_detect()
method will return alist
oflists
containing the results of all batch images.
This example shows how to detect faces, using a single image, and draw the detections in an output image.
>>> from fdet import io, MTCNN
>>>
>>> detector = MTCNN()
>>>
>>> image = io.read_as_rgb('example.jpg')
>>> detections = detector.detect(image)
>>>
>>> output_image = io.draw_detections(image, detections, color='white', thickness=5)
>>> io.save('output.jpg', output_image)
The
io.read_as_rgb()
is a wrapper for opencvcv2.imread()
to ensure an RGB image and can be replaced by:image = cv2.imread('example.jpg', cv2.IMREAD_COLOR) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
A batch should be structured as list of images (numpy
arrays) of equal dimension. The returned detections list will have an additional first dimension corresponding to the batch size. Each image in the batch may have one or more faces detected.
In the following example, we detect faces in every frame of a video using batchs of 10 images.
>>> import cv2
>>> from fdet import io, RetinaFace
>>>
>>> BATCH_SIZE = 10
>>>
>>> detector = RetinaFace(backbone='MOBILENET', cuda_devices=[0,1])
>>> vid_cap = cv2.VideoCapture('path_to_video.mp4')
>>>
>>> video_face_detections = [] # list to store all video face detections
>>> image_buffer = [] # buffer to store the batch
>>>
>>> while True:
>>>
>>> success, frame = vid_cap.read() # read the frame from video capture
>>> if not success:
>>> break # end of video
>>>
>>> frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) # convert to RGB
>>> image_buffer.append(frame) # add frame to buffer
>>>
>>> if len(image_buffer) == BATCH_SIZE: # if buffer is full, detect the batch
>>> batch_detections = detector.batch_detect(image_buffer)
>>> video_face_detections.extend(batch_detections)
>>> image_buffer.clear() # clear the buffer
>>>
>>> if image_buffer: # checks if images remain in the buffer and detect it
>>> batch_detections = detector.batch_detect(image_buffer)
>>> video_face_detections.extend(batch_detections)
The FDet was written heavily inspired by the other available implementations (see credits).
The current MTCNN version was implemented with the help of Davi Beltrão.
-
[zhang:2016]: Zhang, K., Zhang, Z., Li, Z. and Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503. (link to paper)
-
[deng:2019]: Deng, J., Guo, J., Zhou, Y., Yu, J., Kotsia, I. and Zafeiriou, S. (2019). Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641. (link to paper)