Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

This repository contains the codebase for the paper: Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

Paper Details

Authors: Arthur Dantas Mangussi, Ricardo Cardoso Pereira, Miriam Seone Satnos, Ana Carolina Lorena, Mykola Pechenizkiy, and Pedro Henriques Abreu
Abtract:Missing data is a common problem in real-world datasets and can be characterized as the lack of information on one or multiple variables in a dataset. The most frequent technique for handling this issue is imputation, which consists in the replacement of the missing values according to a predefined criterion. Since missing values are often imputed based on the known values in the dataset, existing data issues can be propagated during the imputation process. One such issue is fairness, a concept integral to responsible Artificial Intelligence practices. This work investigates the impact of the imputation process on system fairness by examining how imputation affects the fairness of predictions in Machine Learning models. It provides a comprehensive analysis covering thirteen unfair benchmark datasets with six state-of-the-art imputation strategies under synthetic Missing Not At Random and Missing At Random mechanisms in a multivariate scenario with 10%, 20%, 40%, and 60% of missing rates. Fairness was measured by the following metrics: Statistical Parity, Equalized Odds, Equality of Opportunity, Predictive Equality, Equality of Positive, and Negative Predicted Values. The results demonstrate that the missing mechanism, the classifier choice, and the imputation strategy decisively influence the fairness of the predictions obtained by the Machine Learning models.
Keywords: Missing Data, Fairness, Responsible Artificial Intelligence
Year: 2024
Contact: [email protected]

Installation

git clone https://github.com/ArthurMangussi/Fairness.git
cd Fairness
pip install -r requirements.txt

Acknowledgements

The authors gratefully acknowledge the Brazilian funding agencies FAPESP (Fundação Amparo à Pesquisa do Estado de São Paulo) under grants 2021/06870-3, 2022/10553-6, and 2023/13688-2. Moreover, this research was supported in part by Portuguese Recovery and Resilience Plan (PRR) through project C645008882-00000055 Center for Responsable AI.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Algoritmos		Algoritmos
utils		utils
.gitignore		.gitignore
DatasetsFairness.zip		DatasetsFairness.zip
LICENSE		LICENSE
README.md		README.md
fairness_classification_imputed_datasets.py		fairness_classification_imputed_datasets.py
fairness_imputationMAR.py		fairness_imputationMAR.py
fairness_imputationMAR_uni.py		fairness_imputationMAR_uni.py
fairness_imputationMNAR.py		fairness_imputationMNAR.py
fairness_imputationMNAR_uni.py		fairness_imputationMNAR_uni.py
ks_test.py		ks_test.py
requirements.txt		requirements.txt
unifica_folds.py		unifica_folds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

Paper Details

Installation

Acknowledgements

About

Releases

Packages

Languages

License

ArthurMangussi/Fairness

Folders and files

Latest commit

History

Repository files navigation

Exploring the Influence of Missing Data Imputation in Group Fairness Metrics

Paper Details

Installation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages