PreSeCoLM (Predicting Sensitive Concepts in Language Models)

This repository includes the implementation of some experiments in the scope of predicting sensitive concepts (protected attributes such as ethnicity or gender) in language models to enhance the models interpretability. It includes the code to reproduce the papers:

Sarah Schröder, Alexander Schulz and Barbara Hammer. "Evaluating Concept Discovery Methods for Sensitive Attributes in Language Models". Accepted at ESANN 2025.

Installation

Create and activate conda environment:

conda env create -f env.yml
conda activate presecolm

Install our Wrapper for Huggingface Embeddings:

git clone https://github.com/UBI-AGML-NLP/Embeddings.git
cd Embeddings/
pip install .

Experiment Details

Currently Used Datasets

BIOS
TwitterAAE
Jigsaw Unintended Bias
CrowSPairs

Currently Supported Language Models

Huggingface Models (using this Wrapper)
OpenAI Embedding Models

Concept Prediction Methods

Concept Activation Vectors (CAV)
Concept Bottleneck Models (CBM)
Bias Subspaces (refering to semantic bias scores [1][2], our implementation is based on [1])

Experiment Setup

To run the experiments, a config file (json), such as experiments/config/esann25/experiment_config.json, must be passed. It specifies the models, locations of other relevant configs, where to save embeddings, CAVs, checkpoints of CBMs, plots and results.
The setup (i.e. which datasets, protected groups and defining terms are used) is specified in several .yaml files, referenced in the config. The config must refers to four .yaml files:

cav_train_config: specifies the training for CAV (training one model per protected attribute, dataset does not need class labels)
cbm_train_config: specifies the training for CBM (training a CBM on all protected groups of one dataset; dataset requires class labels in addition to group labels)
eval_config: specifies the datasets and protected groups for evaluation of CAV and CBM (sorted by protected attribute; can include datasets without training split and/or class labels)
bias_space_eval_config: specifies the evaluation setup for bias subspaces (sorted by protected attribute; includes both defining terms for bias subpsaces and a list of datasets/ protected groups for eval)

ESANN 2025 Experiments

See the esann25 branch.

Cite this

TODO

References

[1] "The SAME score: Improved cosine based bias score for word embeddings", Arxiv Paper, IEEE IJCNN Paper
[2] "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings", Arxiv Paper, NIPS Paper

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data_loader		data_loader
examples		examples
experiments		experiments
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
get_openai_embeddings.ipynb		get_openai_embeddings.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PreSeCoLM (Predicting Sensitive Concepts in Language Models)

Installation

Experiment Details

Currently Used Datasets

Currently Supported Language Models

Concept Prediction Methods

Experiment Setup

ESANN 2025 Experiments

Cite this

References

About

Releases

Packages

Languages

License

HammerLabML/PreSeCoLM

Folders and files

Latest commit

History

Repository files navigation

PreSeCoLM (Predicting Sensitive Concepts in Language Models)

Installation

Experiment Details

Currently Used Datasets

Currently Supported Language Models

Concept Prediction Methods

Experiment Setup

ESANN 2025 Experiments

Cite this

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages