Dataset 1: RealToxicityPrompts
[Dataset Link] | [Paper]
Notes:
- The entire codebase for
ROME
can be found in the directoryrome-main
. Cloned from Locating and Editing Factual Associations in GPT (NeurIPS'22). - The
rome-main/trace_main.py
is the main script to run a vanilla example of causal tracing on one of the datasets in the original paper. - The directory
rome-main/dsets
contains the datasets they use. This is where we need to add theRealToxicityPrompts
dataset and load it from for inference. It is present in the filerome-main/dsets/realtoxicityprompts.py
For William:
- Refer to the
RealToxicityPrompts
paper to determine which model they used and pick one of the GPT-2 variants to runROME
with the prompts from the dataset. - Choose the "challenging" subset of prompts, i.e., where
dataset["challenging"] == true
- For our use case, right now just the causal tracing part is enough, we don't need to worry about the editing part yet.