Code and results for the ABCFair benchmark. We evaluate on we evaluate 10 methods on 6 datasets (+ 1 from the unbiased labels in SchoolPerformance), 7 fairness notions, and 2 output formats, and 3 sensitive feature formats.
Putting all results here would lead to an obscenely large README, so we provide two scripts to read out the benchmark results.
Results are presented in latex table code, with a row for each combination of sensitive feature format and maximal fairness violation values k.
The table is generated with the
To see the command line options, run python --help
. These include the dataset,
the fairness notion with respect to which violation is measured, and the output format.
Results are presented in an accuracy-fairness trade-off plot, for a range of fairness strengths. Each scatter point is the mean test performance and fairness violation, with a confidence ellipse (using the standard error) around it.
The plot is generated with the
To see the command line options, run python --help
. These include the dataset,
the fairness notion with respect to which violation is measured, the output format, and the sensitive feature format.
To generate new results, the pipeline can be run with
, which expects a config .yaml
file as input. All
logging is done using wandb
(Weights and Biases), so you will need to login with your own account. Other required
packages are found in requirements.txt
Larger benchmark experiments were done using the config/sweep_config .yaml
files, following standard wandb sweep