Haoliang Zhou, Shucheng Huang, Xuqiao Xu
Micro-Expressions (MEs) are the instantaneous and subtle facial movement that conveys crucial emotional information. However, traditional neural networks face difficulties in accurately capturing the delicate features of MEs due to the limited amount of available data. To address this issue, a dual-branch attention network is proposed for ME recognition, called IncepTR, which can capture attention-aware local and global representations. The network takes optical flow features as input and performs feature extraction using a dual-branch network. First, the Inception model based on the Convolutional Block Attention Module (CBAM) attention mechanism is maintained for multi-scale local feature extraction. Second, the Vision Transformer (ViT) is employed to capture subtle motion features and robustly model global relationships among multiple local patches. Additionally, to enhance the rich relationships between different local patches in ViT, Multi-head Self-Attention Dropping (MSAD) is introduced to drop an attention map randomly, effectively preventing overfitting to specific regions. Finally, the two types of features could be used to learn ME representations effectively through similarity comparison and feature fusion. With such combination, the model is forced to capture the most discriminative multi-scale local and global features while reducing the influence of affective-irrelevant features. Extensive experiments show that the proposed IncepTR achieves UF1 and UAR of 0.753 and 0.746 on the composite dataset MEGC2019-CD, demonstrating better or competitive performance compared to existing state-of-the-art methods for ME recognition.
Following Dual-ATME and RCN, the data lists are reorganized as follow:
data/
├─ MEGC2019/
│ ├─ v_cde_flow/
│ │ ├─ 006_test.txt
│ │ ├─ 006_train.txt
│ │ ├─ 007_test.txt
│ │ ├─ ...
│ │ ├─ sub26_train.txt
│ │ ├─ subName.txt
- There are 3 columns in each txt file:
/home/user/data/samm/flow/006_006_1_2_006_05588-006_05562_flow.png 0 1
In this example, the first column is the path of the optical flow image for a particular ME sample, the second column is the label (0-2 for three emotions), and the third column is the database type (1-3 for three databases).
- There are 68 raws in subName.txt, reference to subName.txt:
006
...
037
s01
...
s20
sub01
...
sub26
Represents ME samples divided by MEGC2019, as described in here ahd here.
If you find this repo useful for your research, please consider citing the paper
@article{zhou2023inceptr,
title={Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer},
author={Zhou, Haoliang and Huang, Shucheng and Xu, Yuqiao},
journal={Multimedia Systems},
volume={29},
number={6},
pages={3863--3876},
year={2023},
publisher={Springer}
}