Skip to content

Natural Language Processing Seminar Final Project Replicating Discrimination Assessment in LMs with Focus on Jewish People Using Various Model Sizes and Quantizations

Notifications You must be signed in to change notification settings

brlvr/Discrimination-Assessment-in-LMs

Repository files navigation

Python Version Conda Version Platform

Replicating Discrimination Assessment in LMs with Focus on Jewish People and Israel Associated Individuals. This project aims to adapt the methodology used in the referenced paper [1] to specifically investigate how LMs handle decisions involving Jewish people and Israel-associated individuals. It would involve generating decision-making scenarios relevant to these groups, systematically varying demographic information to include Jewish and Israel-associated identifiers, and analyzing the responses for patterns of discrimination. The project would also explore prompt-based interventions to mitigate any discovered biases, contributing to the broader understanding of LMs’ handling of specific ethnic and national identities. For more details on the original paper, you can access it [1].

Experiments (Workflow)

EDA Dataset fixes if needed inference - generate answers with Gemma models analyze results We used ollama framework to run the models described later in this README.

Datasets and Models

Dataset \ Model gemma:2b-instruct-v1.1-fp16 gemma:2b-instruct-v1.1-q4_K_M gemma:7b-instruct-v1.1-fp16 gemma:7b-instruct-v1.1-q4_K_M
explicit-combined-jews
explicit-all-jew
implicit-fix-combined-jews

Models Description

Note: write here about the quantization method

Datasets Description

As stated in the Dataset Card on Hugging Face: "The data contains a diverse set of prompts covering 70 hypothetical decision scenarios, ranging from approving a loan to providing press credentials. Each prompt instructs the model to make a binary decision (yes/no) about a particular person described in the prompt. Each person is described in terms of three demographic attributes: age (ranging from 20 to 100 in increments of 10), gender (male, female, non-binary) , and race (white, Black, Asian, Hispanic, Native American), for a total of 135 examples per decision scenario. The prompts are designed so a 'yes' decision is always advantageous to the person (e.g. deciding to grant the loan)." Total of 3x5x70x9 = 9,450 unique examples.

Data Format

The data are in two jsonl files, explicit.jsonl and implicit.jsonl. Each line of json has the following keys:

  • filled_template: The decision question prompt
  • decision_question_id: An ID corresponding to one of 70 decision scenarios.
  • age: Age of person who is the subject of the decision (ranging from 20 to 100 in increments of 10).
  • gender: Gender of person who is the subject of the decision (male, female, non-binary).
  • race: Race of person who is the subject of the decision (white, Black, Asian, Hispanic, Native American).

The implicit.jsonl file does not have an explicit mention of race or gender, but rather relies on an implicit version of these attributes based on a name and pronouns.

For more insights about the data and how we constructed new datasets based on the original one.

This dataset is built on top of the original explicit dataset with addition of Jewish as a race, this dataset is larger then the original explicit dataset because we added one more race option for each question ID -> 11340 examples (3x6x9x70). Also, some examples fixing was needed according to the findings in the EDA.

This dataset is a variation of the original explicit dataset, we added in each filled_template after the race an ethnicity - Jew. Thus we have combinations with racial and ethnic background, for example: Asian Jew, white Jew etc.

The number of unique examples stayed the same regard to the original dataset -> 9450 examples (3x5x9x70). For further details you can see the EDA and how we built the new dataset.

Note: it's important to note that using "Jew" solely as a descriptor after a race could be perceived as reductionist or stereotypical. It's essential to be sensitive to cultural and religious identities when using language in this manner.

This dataset is a variation of the original implicit dataset, it involves specifying an age, along with a name associated with a particular race and gender. Whereas the former approach enables us to assess discrimination based on explicitly mentioned demographic information, this latter approach enables us to assess discrimination based on more subtle information correlated with race and gender.

Here we also addressed Jewish as a race, this dataset is larger then the original implicit dataset because we added one more race option for each question ID By adding Jewish/Israeli names -> 11340 examples (3x6x9x70).

Also, in the original implicit dataset we found a lot of outliers and errors for decision questions, some weren't complete, others were too long, so we had to fix those and come ou with the new dataset. More can be found on the EDA and the final work paper.

References

[1] Tamkin, A., Askell, A., Lovitt, L., Durmus, E., Joseph, N., Kravec, S., Nguyen, K., Kaplan, J. and Ganguli, D., 2023. Evaluating and mitigating discrimination in language model decisions. arXiv preprint arXiv:2312.03689.

[2] Anthropic/discrim-eval Dataset card on Hugging Face .

ollama Git

gemma Git

About

Natural Language Processing Seminar Final Project Replicating Discrimination Assessment in LMs with Focus on Jewish People Using Various Model Sizes and Quantizations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •