Skip to content

Tutorial ‐ EAR Score Extraction

Tim Büchner edited this page Apr 18, 2024 · 7 revisions

This first tutorial deals with extracting the EAR score based only on a video recording of a person. This is the base baseline for the later extraction of the blinks. The EAR score describes the ratio of eye width to height. Hence, a fully closed eye is around 0.0, and a normal open eye is between 0.25 and 0.40 (each individual is different, and the camera angle has a slight impact).

Please note that we use the test_long.mp4 video from the examples folder.

In the end, we will have an EAR score time series for the left (blue) and right (red) eye of the person (see image below).

target

EAR Score GUI - Selection

JeFaPaTo's GUI is built up using a vertical tab system to distinguish between the tasks (facial feature extraction and blinking analysis). To extract facial features like the EAR score, please select Facial Feature Extraction in the tab menu in the upper left corner of the GUI (the red arrow). Please note that this is the default start view of JeFaPaTo after startup. 00_Select_anno

0. GUI Overview

Before we start with the extraction of the EAR score, we will inform you about the general structure of the GUI and its components. The components are grouped by utility and should help you find what you want. In the image below, each area is highlighted by a red box with the appropriate number.

  1. Video Preview

    • In this area, your video will be previewed during the analysis. This includes the entire frame, the bounding box for selecting the facial area (if you want to use this feature), and a preview of the face on the right side. The EAR lines will also be drawn on the face of the person.
    • You can also drag and drop your video into this area to load it automatically without using the loading button in the upper right corner of the GUI.
    • The video preview frame is updated continuously every 5th frame (default value). You can also change the value of Update Delay in box 5. So, as you know, this might freeze the GUI interactions during the extraction.
  2. Feature Range Preview

    • This area shows you the extracted feature values during the video analysis. In this case, the x-axis is negative to represent the past time.
    • The EAR scores are shown in red for the right eye and blue for the left eye.
    • During the extraction, a green line will be plotted. This line indicates whether the extraction was valid [0] or invalid [1]. This is useful for checking in later processing steps to see whether or not to use the data.
    • All other features are still shown as black graphs.
    • In the lower-left corner of the graph, a small A button will appear when extracting the facial features. This is a default feature of the plotting library we use (pyqtgraph). This button auto-scales the x and y axes to cover their full range.
  3. Analysis Interaktion

    • With these three buttons, you can interact with the analysis.
    • The Analyze button starts the analysis with the currently selected features. Once you start this, the settings cannot be changed until completed.
    • The Pause button is only clickable once you start the analysis. It just stops on the current frame. If the analysis is paused, the button switches to Continue to continue the analysis.
    • The Stop button aborts the analysis but triggers the saving mechanism. If you have activated the auto-save feature (see box 5), the results are saved in the same folder as the video. A save dialog window will open if you don't activate this setting.
  4. Settings

    • This area contains all the settings for the video analysis.
    • You can rotate the frames from the video if the recording is in horizontal mode.
    • You can select specific Facial Features such as EAR2D6, EAR3D6, and different landmarking methods.
    • You can select specific blend shape features (based on the mediapipe library of Google) to extract facial movements. Even though they belong also to facial features, we group them separately.
  5. Additional Settings

    • With Update Delay, you can change how often the preview frame updates during the analysis. This is only a quality-of-life feature. Please note that a low value might freeze the GUI if your computer cannot handle really frequent updates.
    • If the Auto-Save checkbox is activated, the extraction results are automatically saved after processing. If not, JeFaPaTo will create a SaveFile Dialog for you to select the appropriate folder.
    • The Use Bounding Box checkbox tells JeFaPaTo to use the bounding box in the preview area (see red box 1) or not. This feature is helpful if you have a video with a smaller face than the recording area.
    • The Auto find face checkbox is a quality of life feature that automatically detects a single face in the video and aligns the bounding box in the preview area accordingly.

00_Overview_anno

1. Loading/Selection of a video

Once you understand the basic overview of the GUI, let's start with extracting the EAR score. We use the test_long.mp4 file in the examples folder for this tutorial. So please download the video and follow along 😄.

Once you have downloaded the video (or your one), you can open the video either by dragging and dropping into the big area (see the image below) or using the Open Video button in the upper right corner of the GUI. In the opened window dialog, navigate to your video file. After this, we are good to go for continuing.

Please note, currently, we support formats listed below (capitalization of the file suffix does not matter). If your video is in a different format, try to convert it with ffmpeg or create a new issue so we can help you.

  • .mp4
  • .flv
  • .ts
  • .mts
  • .avi
  • .mov
  • .wmv

For more information about the video formats, camera setup, and general recommendations, please check the main wiki page.

01_FileOpening_anno

2. The video is loaded

After you follow one of the opening processes, your video should be visible (see the image below). In the full preview (red box 1), you should see the entire frame. In the face preview (red box 2), you should only see the face area of the person you want to analyze. If the Auto face find checkbox (see red box 3) is activated, the bounding box should be aligned with the face. If the selection fails (for whatever reason), you can manually adjust the box's location using your mouse. If the Use Bounding Box checkbox is selected, only the face preview area (see red box 2) extracts facial features. We recommend leaving this on, as most algorithms only work on the explicit face area.

02_VideoOpened_anno

Suppose your video was recorded vertically and still saved in a horizontal format. In that case, you can use the Rotation menu to update the frames accordingly (see highlighted area in the image below). The frame and face preview should be updated accordingly. In our case, this is not necessary. Select the None option in the Rotation menu.

02_VideoOpened_Rotated_anno

3. Bounding Box Update

As mentioned before, the bounding box helps constrain the area of interest you want to analyze. By default, we want to find the best facial region. If you are unhappy with this selection, you can update the bonding box by moving it around and resizing it. For the resizing operations, please use the markers around the bounding box edges and corners (see the image below). We keep the default location for this example video.

03_BoundingBox_anno

4. Extraction Settings

This is the last step before the extraction can start. Now, we have to select which kind of facial features we want to extract.

  1. The Rotation menu is included here once again to ensure your video is aligned.
  2. Facial Features - here, you can select which features you want to extract. We only check the EAR2D6 feature.
    • EAR2D6 describes the EAR score based on the six 2D landmarks around the eye
    • EAR3D6 describes the EAR score based on the six 3D landmarks around the eye (please note that the z-coordinate is estimated by mediapipe)
    • Landmarks478 returns all of the facial landmarks of mediapipe
    • Landmarks68 returns the subset of commonly used 68 landmarks based on mediapipe
  3. Blend shapes - in this menu, you can check which of the 51 facial blend shapes you want to extract. We will not use any of these for this tutorial.

For our targeted extraction, please note that we only use the EAR2D6 facial feature.

04_Settings_anno

5. Start the Analysis

Now, we have selected the facial bounding box and selected which facial features we want to extract (still only the EAR2D6 feature). If we start the analysis using the Analyze button, there is no going back anymore. Please confirm if the GUI looks the same as the one below.

Once you press the button, the setting cannot be changed anymore until the processing is complete.

05_AnalyzeStart_anno

6. During the Analysis

While the processing runs, there is nothing more you can do but wait. However, the GUI gives you general information about what happens in the background. We highlighted the according areas in the image below.

  1. Feature Preview
    • In the graph, the currently selected features (in our case, EAR2D6 is shown as the red lines (right eye) and the blue lines (left eye).
    • The graph automatically scrolls with the previewed frames. You can interact with graphs by zooming, scrolling, and panning. However, only data from the last 30 seconds of the video are displayed.
    • The graph can be reset to the original view by pressing the button Reset Graph Y-Range
  2. Video interaction
    • While the processing is running, the Analyze button is disabled. You can Pause or Stop the processing.
  3. Throughput
    • In the lower-left corner of the GUI, you can see how the processing is going on in the background
    • Input describes how many frames we can load from the video per second. This value is impacted by where you load the video and your CPU.
    • Processed describes how many frames per second we can analyze by extracting the facial features. Your CPU and your GPU in the background impact this value.
  4. Processing Percentage
    • We also display how much the video is already done, so you can easily estimate how much time roughly remains.
  5. Blink
    • This is not part of the actual GUI, but we want to highlight how an actual blink looks in the EAR score time series. We will extract these in the next tutorial. So please keep reading :)

For now, we have to wait until the processing is finished. This solely depends on the length of your video and the strength of your CPU/GPU. Make yourself a tea and read a nice research paper.

06_AnalyzeBlink_anno

7. Completion of the Analysis

Once the processing of the analysis is completed, JeFaPaTo should send you a notification via your operating system. The GUI did not change that much; it just means that the settings can be changed again. If you have the Auto-save feature activated, the features are saved in a .csv file automatically in the same folder as the video. We use the original video with the current timestamp attached to the name (see the images below).

The .csv file contains a row for each frame of the video and columns for each selected feature. In our case, your feature file will contain EAR2D6_l and EAR2D6_r, which are the EAR scores both both left and right eye, as well as EAR_valid to indicate whether the extraction was correct. This three columns will be used during the blinking extraction.

If you want to test other settings, you can change them and run the analysis again by pressing the Analyze Video button. Your results should not be overwritten, as the timestamp should be different.

07_Resultsfinished_anno 08_AutoSave_anno

8. End of Extraction

Now, we have successfully extracted the EAR score of a video, and in the next tutorial, we will extract the actual blink information from it. The tutorial should have covered how to load and prepare the video, and you should now have a broad overview of which features you can extract. This would be the case for now with regard to facial feature extraction. Let's continue with the blink extraction.