Skip to content

Latest commit

 

History

History
224 lines (153 loc) · 14.1 KB

README.md

File metadata and controls

224 lines (153 loc) · 14.1 KB

DebFace

Description

An implementation of the paper - "Jointly de-biasing face recognition and demographic attribute estimation" by Sixue Gong et al., 2020 [1]. This project is aimed at de-biasing and privacy preservation in biometric methods that use face verification.

Environment Setup

Dependencies

  1. Python 3
  2. PyTorch
  3. torchsummary
  4. argparse
  5. configparser
  6. torchviz
  7. OpenCV
  8. glob
  9. shutil
  10. Pandas
  11. NumPy
  12. Pillow

Using host OS environment:

  1. Check to see if your Python installation has pip. Enter the following in your terminal:

     pip3 -h
    

    If you see the help text for pip then you have pip installed, otherwise download and install pip

  2. Clone the repo from GitHub and then install the various dependencies using pip

    Mac OS / Linux

     git clone https://github.com/hrishi508/DebFace.git
     cd DebFace/
    

Using a virtual environment:

  1. Check to see if your Python installation has pip. Enter the following in your terminal:

     pip3 -h
    

    If you see the help text for pip then you have pip installed, otherwise download and install pip

  2. Install the virtualenv package

     pip3 install virtualenv
    
  3. Create the virtual environment

     virtualenv debface_env
    
  4. Activate the virtual environment

    Mac OS / Linux

     source debface_env/bin/activate
    
  5. Clone the repo from GitHub and then install the various dependencies using pip

    Mac OS / Linux

     git clone https://github.com/hrishi508/DebFace.git
     cd DebFace/
    

Directory Structure

.
├── backbones
│   ├── am_softmax.py
│   ├── classifier.py
│   ├── debface.py
│   ├── encoder.py
│   ├── __init__.py
│   └── iresnet.py
├── config.ini
├── dataset_cleaner.py
├── dataset_filter.py
├── dataset_info.py
├── dataset_organizer.py
├── dataset_splitter.py
├── DebFace Computation Graph
│   ├── DebFace_Final
│   ├── DebFace_Final.png
│   ├── DebFace_Final_without_race
│   ├── DebFace_Final_without_race.png
├── full_training_strategy.txt
├── LICENSE
├── model_summary.txt
├── README.md
├── train.py
├── train_without_race.py
└── utils
    └── utils_config.py

3 directories, 23 files

Config File

To make the control of all hyperparameters and paths across this project seamless and simple, I have created a global configuration file - config.ini. All the scripts in this project access and use the various argument values listed in the config file. If you intend to train the DebFace model on your device, you will have to set the arguments in the config file accordingly. I have included detailed comments for each of the arguments to make it simple to use.

The config.ini file is integrated with all the other scripts in this project via the utils_config.py script. I have provided a 'ConfigParams' class that extracts all the information from the config file.

Datasets

For training the DebFace model, we require a dataset that satisfies the following constraints:

  1. Frontal Face images organized subject wise with at least 100 images per subject
  2. Gender, Age and Race labels associated with the images, i.e. every image will have such a label - [ID, Gender, Age, Race]

I have spent a lot of time looking for such a dataset but the only dataset that I could find which satisfied all the above constraints was the VGG-Face2 Mivia Age Estimation (VMAGE) Dataset. The only issue with this dataset was that it was based on the VGG-Face2 dataset which is currently been removed from the oxford server due to legal issues. If you find any pubicly available dataset that satisfies all the above constraints, please do tell us.

So, for training this model, I went with a dataset which partially satisfies the contraints. I have used the IMFDB [2] dataset. The only drawback of using this dataset was that since it contains images of only Indian Movie Actors, it is devoid of Race labels. You can find the details of this dataset here.

The original dataset has many incomplete labels and various mismatches among the labels and the corresponding images. It also has many extra labels that are irrelevant for training the model (the only ones we require are ID, Gender, Age and Race).

So, I first partly cleaned the dataset manually (available here), then I designed a few custom scripts to automate the cleaning, flitering and transformation of the IMFDB dataset so that it can be directly used for training the DebFaceWithoutRace model. I have uploaded the final cleaned version of the dataset here.

Now, you can either choose to download the final dataset and directly train the DebFace model on it, or you can use my custom-made scripts on the manually cleaned version of to generate your dataset for training.

NOTE: If you download the dataset, please create a 'datasets/' folder in the root directory of this repository that you had cloned earlier in the environment setup section and move this downloaded dataset folder 'IMFDB_final/' into the 'datasets/' folder for the training scripts to work smoothly

WARNING: DO NOT RUN THESE SCRIPTS ON THE ORIGINAL IMFDB DATASET!

They will end up throwing a lot of errors due to the mislabelled samples and mismatches. Only run in this script on the manually cleaned version that I have provided here.

Following are the steps to generate your own data from the manually cleaned IMFDB using the custom-made scripts:

  1. After cloning this repository, navigate to the root (i.e. DebFace/) and create a 'datasets/' folder there.

  2. Download the manually cleaned IMFDB dataset from here and move it into the newly created 'datasets/' folder.

  3. Set the arguments in the config.ini appropriately

  4. Navigate to the root directory of the repository, and run the custom scripts in succession using the commands given below:

     python3 dataset_organizer.py FULL-PATH-TO-THE-CONFIG-FILE
     python3 dataset_cleaner.py FULL-PATH-TO-THE-CONFIG-FILE
     python3 dataset_filter.py FULL-PATH-TO-THE-CONFIG-FILE
    

NOTE: Replace the FULL-PATH-TO-THE-CONFIG-FILE with the path of the config.ini file enclosed in "". For example, for my device, I set the FULL-PATH-TO-THE-CONFIG-FILE to "/home/hrishi/Repos/DebFace/config.ini".

Following the above steps will create many directories in the 'datasets/' folder. The final dataset directory is the 'IMFDB_final/' which will be used for training. You can igonre the other intermediate directories i.e., 'IMFDB_simplified/' and 'IMFDB_cleaned/'.

To extract metadata about the newly generated IMFDB_final, I have provided a custom script dataset_info.py. Running this script will log all the important details extracted from the newly generated dataset to a 'IMFDB_final_info.txt' file in the 'datasets/' folder.

python3 dataset_info.py FULL-PATH-TO-THE-CONFIG-FILE

Model Definition Scripts

NOTE: I have also provided a 'DebFaceWithoutRace' model (in addition to the 'DebFace' model). This is the same model as 'DebFace' but excluding the 'race' classifier and all its connection. This is the model that has been used in this project for reasons that have been already covered in the datasets section. The following scripts have been directly taken from the implementation provided by InsightFace [3]:

  1. init.py
  2. iresnet.py

The script am_softmax.py has been directly taken from the implementation provided by DebFace. I have tried using the am_softmax from this script according to the paper but their implementation is buggy since instead returning values in the range (0, 1) (logits), it returns all kinds of positive and negative real values. So, wherever am_softmax was supposed to be used, I have replaced it with ordinary softmax.

The following scripts have been created by me from scratch to provide a seamless implementation of the model given in the DebFace paper:

  1. encoder.py - This script defines the encoder class that initializes the ArcFace50 encoder provided by InsightFace and appends a ReLU layer to it.

  2. classifier.py - This script contains the classifier class whose input and outputs can be modfied to create the various age, gender, race and ID classifiers as given in the model architecture of the DebFace paper. Since the exact details of the classifiers were missing in the paper, I have used a single layer neural network in the script.

  3. debface.py - This script contains the 'DebFace' class which basically initializes the encoder and all the demographic and ID classifiers and integrates them seamlessly. NOTE: I have also provided a 'DebFaceWithoutRace' class, this is the same model as above but excluding the 'race' classifier and all its connection. This is the class that has been used in this project for reasons that will be covered in the section below.

Training the model

I have provided two custom scripts train.py and train_without_race.py to make the training process seamless and make the iteration of the various hyperparameters easier.

NOTE: As already mentioned earlier in the datasets section, this project is using the IMFDB dataset for model training which does not contain race labels. Thus, this project uses the train_without_race.py to train the model which makes use of the 'DebFaceWithoutRace' class in the debface.py.

NOTE: If you want to train the model on a dataset which has all the labels (satisfies all the constraints mentioned in the datasets section), please use train.py for the same.

Following are the steps to train the DebFace model on the final dataset that you either already downloaded or genereated by following the steps in the datasets section:

  1. Navigate to the root directory of the repository, and run the dataset_splitter.py that I have provided to split the final dataset into Train and Test using the command given below:

     python3 dataset_splitter.py FULL-PATH-TO-THE-CONFIG-FILE
    

Running the above will generate 'Train/' and 'Test/' directories in your 'datasets/' directory that you had created while following the steps in the dataset section.

  1. Set the arguments in the config.ini appropriately

  2. You're set to start training the model now! Run the command below to begin the training:

     python3 train_without_race.py FULL-PATH-TO-THE-CONFIG-FILE
    

NOTE: Replace the FULL-PATH-TO-THE-CONFIG-FILE with the path of the config.ini file enclosed in "". For example, for my device, I set the FULL-PATH-TO-THE-CONFIG-FILE to "/home/hrishi/Repos/DebFace/config.ini".

Contributing to the project

Where do I start?

  • Ask us by reaching out to any of the contributors through the Contact Us section. Someone there could need help with something.
  • You can also take the initiative and fix a bug you found, create an issue for discussion or implement a feature that we never though of, but always wanted.

Ok, I found something. What now?

  • Tell us, if you haven't already. Chances are that we have additional information and directions.
  • Read the code and get familiar with the engine component you want to work with.
  • Do not hesitate to ask us for help if you do not understand something.

How do I contribute my features/changes?

  • You can upload work in progress (WIP) revisions or drafts of your contribution to get feedback or support.
  • Tell us (again) when you want us to review your work.

Contact us

References

[1] Gong, Sixue and Liu, Xiaoming and Jain, A (2020). Jointly de-biasing face recognition and demographic attribute estimation ECCV, link to the paper.

[2] Shankar Setty, Moula Husain, Parisa Beham, Jyothi Gudavalli, Menaka Kandasamy, Radhesyam Vaddi, Vidyagouri Hemadri, J C Karure, Raja Raju, Rajan, Vijay Kumar and C V Jawahar (2013). Indian Movie Face Database: A Benchmark for Face Recognition Under Wide Variations NCVPRIPG, link to the paper.

[3] Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos (2019). ArcFace: Additive Angular Margin Loss for Deep Face Recognition CVPR, link to the paper.