-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial ‐ EAR Score Extraction
This first tutorial deals with extracting the EAR score based only on a video recording of a person. This is the base baseline for the later extraction of the blinks. The EAR score describes the ratio of eye width to height. Hence, a fully closed eye is around 0.0, and a normal open eye is between 0.25 and 0.40 (each individual is different, and the camera angle has a slight impact).
Please note that we use the test_long.mp4
video from the examples folder.
In the end, we will have an EAR score
time series for the left (blue) and right (red) eye of the person (see image below).
JeFaPaTo's
GUI is built up using a vertical tab system to distinguish between the tasks (facial feature extraction and blinking analysis).
To extract facial features like the EAR score,
please select Facial Feature Extraction
in the tab menu in the upper left corner of the GUI (the red arrow).
Please note that this is the default start view of JeFaPaTo
after startup.
Before we start with the extraction of the EAR score,
we will inform you about the general structure of the GUI and its components.
The components are grouped by utility and should help you find what you want.
In the image below, each area is highlighted by a red
box with the appropriate number.
-
Video Preview
- In this area, your video will be previewed during the analysis. This includes the entire frame, the bounding box for selecting the facial area (if you want to use this feature), and a preview of the face on the right side. The
EAR
lines will also be drawn on the face of the person. - You can also drag and drop your video into this area to load it automatically without using the loading button in the upper right corner of the GUI.
- The video preview frame is updated continuously every 5th frame (default value). You can also change the value of
Update Delay
inbox 5
. So, as you know, this might freeze the GUI interactions during the extraction.
- In this area, your video will be previewed during the analysis. This includes the entire frame, the bounding box for selecting the facial area (if you want to use this feature), and a preview of the face on the right side. The
-
Feature Range Preview
- This area shows you the extracted feature values during the video analysis. In this case, the
x-axis
is negative to represent the past time. - The
EAR
scores are shown inred
for the right eye andblue
for the left eye. - During the extraction, a
green
line will be plotted. This line indicates whether the extraction was valid [0] or invalid [1]. This is useful for checking in later processing steps to see whether or not to use the data. - All other features are still shown as
black
graphs. - In the lower-left corner of the graph, a small
A
button will appear when extracting the facial features. This is a default feature of the plotting library we use (pyqtgraph). This button auto-scales thex
andy
axes to cover their full range.
- This area shows you the extracted feature values during the video analysis. In this case, the
-
Analysis Interaktion
- With these three buttons, you can interact with the analysis.
- The
Analyze
button starts the analysis with the currently selected features. Once you start this, the settings cannot be changed until completed. - The
Pause
button is only clickable once you start the analysis. It just stops on the current frame. If the analysis is paused, the button switches toContinue
to continue the analysis. - The
Stop
button aborts the analysis but triggers the saving mechanism. If you have activated the auto-save feature (see box 5), the results are saved in the same folder as the video. A save dialog window will open if you don't activate this setting.
-
Settings
- This area contains all the settings for the video analysis.
- You can rotate the frames from the video if the recording is in horizontal mode.
- You can select specific
Facial Features
such asEAR2D6
,EAR3D6
, and differentlandmarking
methods. - You can select specific blend shape features (based on the mediapipe library of Google) to extract facial movements. Even though they belong also to facial features, we group them separately.
-
Additional Settings
- With
Update Delay,
you can change how often the preview frame updates during the analysis. This is only a quality-of-life feature. Please note that a low value might freeze the GUI if your computer cannot handle really frequent updates. - If the
Auto-Save
checkbox is activated, the extraction results are automatically saved after processing. If not,JeFaPaTo
will create a SaveFile Dialog for you to select the appropriate folder. - The
Use Bounding Box
checkbox tellsJeFaPaTo
to use the bounding box in the preview area (see red box 1) or not. This feature is helpful if you have a video with a smaller face than the recording area. - The
Auto find face
checkbox is a quality of life feature that automatically detects a single face in the video and aligns the bounding box in the preview area accordingly.
- With
Once you understand the basic overview of the GUI, let's start with extracting the EAR score.
We use the test_long.mp4
file in the examples folder for this tutorial.
So please download the video and follow along 😄.
Once you have downloaded the video (or your one), you can open the video either by dragging and dropping into the big area (see the image below) or using the Open Video
button in the upper right corner of the GUI. In the opened window dialog, navigate to your video file.
After this, we are good to go for continuing.
Please note, currently, we support formats listed below (capitalization of the file suffix does not matter). If your video is in a different format, try to convert it with ffmpeg or create a new issue so we can help you.
- .mp4
- .flv
- .ts
- .mts
- .avi
- .mov
- .wmv
For more information about the video formats, camera setup, and general recommendations, please check the main wiki page.
After you follow one of the opening processes, your video should be visible (see the image below).
In the full preview (red box 1), you should see the entire frame.
In the face preview (red box 2), you should only see the face area of the person you want to analyze.
If the Auto face find
checkbox (see red box 3) is activated, the bounding box should be aligned with the face. If the selection fails (for whatever reason), you can manually adjust the box's location using your mouse.
If the Use Bounding Box
checkbox is selected, only the face preview area (see red box 2) extracts facial features. We recommend leaving this on, as most algorithms only work on the explicit face area.
Suppose your video was recorded vertically and still saved in a horizontal format. In that case, you can use the Rotation
menu to update the frames accordingly (see highlighted area in the image below).
The frame and face preview should be updated accordingly.
In our case, this is not necessary. Select the None
option in the Rotation
menu.
As mentioned before, the bounding box helps constrain the area of interest you want to analyze. By default, we want to find the best facial region. If you are unhappy with this selection, you can update the bonding box by moving it around and resizing it. For the resizing operations, please use the markers around the bounding box edges and corners (see the image below). We keep the default location for this example video.
This is the last step before the extraction can start. Now, we have to select which kind of facial features we want to extract.
- The
Rotation
menu is included here once again to ensure your video is aligned. - Facial Features - here, you can select which features you want to extract. We only check the
EAR2D6
feature.-
EAR2D6
describes theEAR
score based on the six 2D landmarks around the eye -
EAR3D6
describes theEAR
score based on the six 3D landmarks around the eye (please note that the z-coordinate is estimated by mediapipe) -
Landmarks478
returns all of the facial landmarks of mediapipe -
Landmarks68
returns the subset of commonly used 68 landmarks based on mediapipe
-
- Blend shapes - in this menu, you can check which of the 51 facial blend shapes you want to extract. We will not use any of these for this tutorial.
For our targeted extraction, please note that we only use the EAR2D6
facial feature.
Now, we have selected the facial bounding box and selected which facial features we want to extract (still only the EAR2D6
feature).
If we start the analysis using the Analyze
button, there is no going back anymore.
Please confirm if the GUI looks the same as the one below.
Once you press the button, the setting cannot be changed anymore until the processing is complete.
While the processing runs, there is nothing more you can do but wait. However, the GUI gives you general information about what happens in the background. We highlighted the according areas in the image below.
- Feature Preview
- In the graph, the currently selected features (in our case,
EAR2D6
is shown as the red lines (right eye) and the blue lines (left eye). - The graph automatically scrolls with the previewed frames. You can interact with graphs by zooming, scrolling, and panning. However, only data from the last 30 seconds of the video are displayed.
- The graph can be reset to the original view by pressing the button
Reset Graph Y-Range
- In the graph, the currently selected features (in our case,
- Video interaction
- While the processing is running, the
Analyze
button is disabled. You canPause
orStop
the processing.
- While the processing is running, the
- Throughput
- In the lower-left corner of the GUI, you can see how the processing is going on in the background
-
Input
describes how many frames we can load from the video per second. This value is impacted by where you load the video and your CPU. -
Processed
describes how many frames per second we can analyze by extracting the facial features. Your CPU and your GPU in the background impact this value.
- Processing Percentage
- We also display how much the video is already done, so you can easily estimate how much time roughly remains.
- Blink
- This is not part of the actual GUI, but we want to highlight how an actual blink looks in the
EAR score
time series. We will extract these in the next tutorial. So please keep reading :)
- This is not part of the actual GUI, but we want to highlight how an actual blink looks in the
For now, we have to wait until the processing is finished. This solely depends on the length of your video and the strength of your CPU/GPU. Make yourself a tea and read a nice research paper.
Once the processing of the analysis is completed, JeFaPaTo
should send you a notification via your operating system.
The GUI did not change that much; it just means that the settings can be changed again.
If you have the Auto-save
feature activated, the features are saved in a .csv
file automatically in the same folder as the video.
We use the original video with the current timestamp attached to the name (see the images below).
The .csv
file contains a row for each frame of the video and columns for each selected feature.
In our case, your feature file will contain EAR2D6_l
and EAR2D6_r
, which are the EAR scores
both both left and right eye, as well as EAR_valid
to indicate whether the extraction was correct.
This three columns will be used during the blinking extraction
.
If you want to test other settings, you can change them and run the analysis again by pressing the Analyze Video
button.
Your results should not be overwritten, as the timestamp should be different.
Now, we have successfully extracted the EAR score
of a video, and in the next tutorial, we will extract the actual blink information from it.
The tutorial should have covered how to load and prepare the video, and you should now have a broad overview of which features you can extract.
This would be the case for now with regard to facial feature extraction. Let's continue with the blink extraction.