Welcome to the Parametric RAG Toolkit, developed as part of our SIGIR 2025 Tutorial: Dynamic and Parametric Retrieval-Augmented Generation.
This repository provides a comprehensive and easy-to-use toolkit designed to help researchers and practitioners quickly reproduce, compare, and extend Parametric Retrieval-Augmented Generation (Parametric RAG) methods, specifically PRAG and DyPRAG.
⭐️ Star this repository to support our work and stay updated!
- Parametric RAG Toolkit
The Parametric RAG Toolkit simplifies experimenting with Parametric RAG techniques, a powerful approach to retrieval-augmented generation by encoding external knowledge into model parameters using LoRA (Low-Rank Adaptation). This toolkit enables users to:
- Reproduce PRAG and DyPRAG methods from end-to-end.
- Easily switch base LLM models and extend to new datasets.
- Understand how to generate and utilize LoRA adapters during offline training and inference stages.
Currently supported:
- ✅ PRAG (SIGIR 2025 Paper)
- ✅ DyPRAG (Arxiv Paper, GitHub)
More Parametric RAG variants will be supported soon!
Follow these steps and you can quickly run Parametric RAG experiments:
Before you really start to use this toolkit, please make sure you've finished the following preparations:
- Change the path of
src/root_dir_path.py
to the path you place this toolkit.
For example, if you place this toolkit in/home/user/sigir25-tutorial-parametric
, you should change the content ofsrc/root_dir_path.py
to:ROOT_DIR = "/home/user/sigir25-tutorial-parametric"
- If you've downloaded the LLM models manually, you can modify the paths in
src/utils.py
andsrc/retrieve/retriever.py
to point to your local model directories.Alternatively, you can use our default settings, which will automatically download the models from HuggingFace if they are not already cached locally.
Firstly, you need to clone this repository.
git clone https://github.com/oneal2000/sigir25-tutorial-parametric.git
cd sigir25-tutorial-parametric
Then, you need to install the required dependencies.
conda create -n prag python=3.10.4
conda activate prag
pip install -r requirements.txt
- Download the Wikipedia dump from the DPR repository with the script below:
bash scripts/download_dpr.sh
- Use Elasticsearch to index the Wikipedia dump:
bash scripts/prep_elastic.sh
- NOTE: Due to environment differences, there may be some issues with the Elasticsearch setup. Therefore, we strongly reccommend you to use LLMs(ChatGPT, Gemini, etc) to help you resolve errors if you encounter any. Besides, please read the comments in this bash script carefully because some parts are ONLY needed for first use and you should comment them afterwards, for example, the part to download the elasticsearch is only needed for the first time you run the script.
We provide ways to download 4 types of datasets for you to experiment with, including 2WikiMultihopQA, HotpotQA, PopQA, and ComplexWebQuestions. To reproduce the results in this toolkit, you just need to download popQA and ComplexWebQuestions datasets. You can download them by running the corresponding commands below.
For 2WikiMultihopQA:
Download the 2WikiMultihopQA dataset from its repository https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1. Unzip it and move the folder to data/2wikimultihopqa
.
For HotpotQA:
bash scripts/download_hotpotqa.sh
For PopQA:
Download the PopQA dataset from its repository https://github.com/AlexTMallen/adaptive-retrieval/blob/main/data/popQA.tsv, and put the file popQA.tsv
into folder data/popqa
.
For ComplexWebQuestions:
Download the ComplexWebQuestions dataset from its repository https://www.dropbox.com/scl/fo/nqujvpg2gc4y0ozkw3wgr/AOzjVEsdUhv2Fx2pamfJlSw?rlkey=746t7xehfqxf1zr867nxiq8aq&e=1, and put the file ComplexWebQuestions_dev.json
into folder data/complexwebquestions
.
Data augmentation is aimed to integrate multiple rewrites with corresponding QA pairs of a given document to generate a more comprehensive document that consists of diversed linguistic variations.
For PRAG, you need to run command like this:
python src/augment.py \
--model_name llama3.2-1b-instruct \
--dataset popqa \
--data_path data/popqa/ \
--sample 300 \
--topk 3
The results of data augmentation for PRAG will be stored in the file data_aug/{dataset}/{data_type}.json
. And they will be used to generate parameterize document in PRAG and infernece.
To reproduce the results showed in this toolkit, you can directly run the script:
bash configs/PRAG/augment/augment_prag.sh
For training DyPRAG parameter translator, you need to set output_dir
to data_aug_projector
and set projector
.
python src/augment.py \
--model_name llama3.2-1b-instruct \
--dataset popqa \
--data_path data/popqa/ \
--sample 200 \
--topk 3 \
--output_dir data_aug_projector \
--projector \
The results of data augmentation will be stored in the file data_aug_projector/{dataset}/{data_type}.json
. This augmented dataset will be used to train the parameter translator in DyPRAG.
According to DyPRAG, you should collect 200 additional questions besides the original 300 questions collected in data_aug
, and use 3 different models to augment the data. Thus, you'll get 4800 samples for the parameter translator training.
For convinence, we provide pre-augmented data files, which include 4 types of datasets and each dataset is augmented by 3 models, and we recommend you to use them directly, you can use them by running the command:
tar -xzvf data_aug.tar.gz
python src/encode.py \
--model_name=llama3.2-1b-instruct \
--dataset=popqa \
--sample=300 \
--per_device_train_batch_size=1 \
--num_train_epochs=1 \
--learning_rate=0.0003 \
--lora_rank=2 \
--lora_alpha=32 \
--with_cot \
--projector
For DyPRAG training, set projector
and for PRAG inference unset projector
.
All running parameters used in encoding in PRAG can be found in configs/PRAG/encode
and if you want to reproduce the results showed in this toolkit, you can directly run this script:
bash configs/PRAG/encode/encode_prag.sh
All running parameters used to get samples for DyPRAG training can be found in configs/DyPRAG/encode
, if you want to train the parameter translator by yourself, you need to run 12 scripts in configs/DyPRAG/encode
, which will generate 4800 samples for the parameter translator training.
python3 -u src/train_dyprag.py \
--model_name=llama3.2-1b-instruct \
--datasets="2wikimultihopqa,popqa,hotpotqa,complexwebquestions" \
--learning_rate=0.0003 \
--lora_rank=2 \
--lora_alpha=32 \
--max_new_tokens=128 \
--sample_rate=1 \
--dyprag_learning_rate=1e-5 \
--dyprag_train_epochs=1 \
The well-trained parameter translator will be saved to projector/f'{args.model_name}_hidden{args.projector_p}_sample{args.sample_rate}_lr{args.dyprag_learning_rate}
folder.
For convinence, you can directly use the pre-trained parameter translator provided in the official github repository of DyPRAG, you can download them here, if you want to reproduce the results showed in this toolkit, you need to put the downloaded llama-1b translator file into the folder projector/llama3.2-1b-instruct_hidden32_sample1.0_lr1e-05
and rename it to epoch_0.pt
, and put the downloaded qwen-1.5b translator file into the folder projector/qwen2.5-1.5b-instruct_hidden32_sample1.0_lr1e-05
and rename it to epoch_0.pt
.
For PRAG, you can infer with this command:
python3 src/inference.py \
--model_name=llama3.2-1b-instruct \
--dataset=popqa \
--sample=300 \
--num_train_epochs=2 \
--learning_rate=0.0003 \
--lora_rank=2 \
--lora_alpha=32 \
--max_new_tokens=20 \
--inference_method=combine
For DyPRAG, you can infer with this command:
python3 src/inference_dyprag.py \
--model_name=llama3.2-1b-instruct \
--dataset=popqa \
--sample=-1 \
--num_train_epochs=1 \
--learning_rate=0.0003 \
--lora_rank=2 \
--lora_alpha=32 \
--max_new_tokens=128 \
--inference_method=dyprag \
--inference_epoch=1 \
--projector_path="llama3.2-1b-instruct_hidden32_sample1.0_lr1e-05" \
--projector_p=32
- We test 5 ways of inference in this toolkit, including
icl
,prag
,prag_combine
,dyprag
, anddyprag_combine
. - All running parameters used in inference can be found in
configs/PRAG/inference
andconfigs/DyPRAG/inference
, and you can directly run these scripts to reproduce the results. - The inference process will generate three files for each sub-dataset:
config.json
:the configuration of the inference process, including the model name, dataset, learning rate, etc.predict.json
: the predicted answer for each question in the dataset and evaluation results like F1 score, EM score for each question.result.txt
: the overall evaluation results like average F1 score, average EM score, etc.
We conducted experiments on two datasets, PopQA and ComplexWebQuestions, using two LLMs, LLama3.2-1B and Qwen2.5-1.5B in this toolkit. The results are shown in the table below:
Model | Method | popqa | Script | complexwebquestions | Script |
---|---|---|---|---|---|
LLama3.2-1B | standard RAG(ICL) | 0.2025 | icl | 0.3762 | icl |
PRAG | 0.2150 | prag | 0.3525 | prag | |
PRAG-combine | 0.3271 | prag_combine | 0.4024 | prag_combine | |
DyPRAG | 0.0937 | DyPRAG | 0.3633 | DyPRAG | |
DyPRAG-combine | 0.3144 | DyPRAG_combine | 0.3921 | DyPRAG_combine | |
Qwen2.5-1.5B | standard RAG(ICL) | 0.0999 | icl | 0.2823 | icl |
PRAG | 0.2162 | PRAG | 0.3082 | PRAG | |
PRAG-combine | 0.2364 | PRAG_combine | 0.3209 | PRAG_combine | |
DyPRAG | 0.0664 | DyPRAG | 0.3194 | DyPRAG | |
DyPRAG-combine | 0.2269 | DyPRAG_combine | 0.3357 | DyPRAG_combine |
All results above are reported as F1 scores, and the best results are highlighted in bold. The running parameters used in each experiment can be found in the corresponding script showed in the table above.
Parametric-RAG-Toolkit/
├── configs/ # Example configurations for PRAG & DyPRAG
├── data/ # Data storage and preprocessing scripts
├── scripts/ # Data download and preparation scripts
├── src/
│ ├── fewshot # Provide few-shot learning samples
│ ├── retrieve # Implementation of BM25 retriever
│ ├── models # Implementation of parameter injection for LLMs
│ ├── augment.py # Data augmentation script
│ ├── encode.py # Generate parametric knowledge (LoRA)
│ ├── train_dyparg.py # Train the parameter translator for DyPRAG
│ ├── inference.py # Inference using parametric knowledge for PRAG
│ ├── inference_dyprag.py # Inference for DyPRAG
| ├── projector.py # Implementation of parameter translator in DyPRAG
| ├── root_dir_path.py # The path you place this toolkit
| ├── prompt_template.py # Provide prompts' templates for model generation
│ └── utils.py # Common utilities and evaluation scripts
├── prep_elastic.py # Build index for wikipedia dump using Elasticsearch
├── requirements.txt # Python dependencies
├── data_aug.tar.gz # Pre-augmented data files
└── README.md # Documentation and usage guide
The Parametric RAG Toolkit is designed for flexibility and ease of extension.
To switch the base LLM:
- Choose your desired LLM from transformers.models
- Copy
configuration_xxx.py
andmodeling_xxx.py
to themodels
folder and modify the import information inmodeling_xxx.py
similar to oursrc/models/modeling_qwen2.py
- Modify
forward
function of MLP module inmodeling_xxx.py
similar to oursrc/models/modeling_qwen2.py
Line 57-69 - Add a new class in
get_model_class
function insrc/utils.py
to load the new type of LLMs. - Add a new path in
get_model_path
function insrc/utils.py
to load the new type of LLMs. - Update the
--model_name
parameter in scripts and configuration files.
Datasets already supported:
- 2WikiMultihopQA
- HotpotQA
- PopQA
- ComplexWebQuestions
To add a new dataset:
-
Prepare your dataset in JSON format with structure:
[ { "question": "your question", "answer": "answer text or list of acceptable answers" } ]
-
Place the file in
data/{your_dataset}
. -
Update data augmentation scripts accordingly:
python src/augment.py \ --model_name llama3.2-1b-instruct \ --dataset your_dataset \ --data_path data/your_dataset/ \ --sample 300 \ --topk 3
For example, if you want to use StrategyQA dataset, you can download it from StrategyQA and place it in data/strategyqa
. Then, you can extract question
and answer
from the dataset strategyqa_train.json
and put them into a json file, for example, data/strategyqa/total.json
, and then you can run the data augmentation script like this:
python src/augment.py \
--model_name llama3.2-1b-instruct \
--dataset strategyqa \
--data_path data/strategyqa/ \
--sample 300 \
--topk 3
This toolkit divides the process clearly into two stages:
- Perform data augmentation to enhance documents.
- Generate LoRA parameters embedding the external knowledge into LLM.
- Train a prameter translator for DyPRAG.
PRAG
- Load pre-generated LoRA parameters.
- Run inference using your customized parametric knowledge.
DyPRAG
- Use the trained parameter translator to generate LoRA parameters.
- Run inference using your customized parametric knowledge.
Detailed documentation for each script and parameter can be found within configs
and src
.
We welcome contributions! Please open an issue or submit a pull request if you want to extend the toolkit or suggest improvements.
If you find this toolkit helpful, please cite our work:
@inproceedings{su2025parametric,
title={Parametric Retrieval-Augmented Generation},
author={Su, Weihang and Tang, Yichen and Ai, Qingyao and Yan, Junxi and Wang, Changyue and Wang, Hongning and Ye, Ziyi and Zhou, Yujia and Liu, Yiqun},
booktitle={SIGIR},
year={2025}
}
🌟 Thank you for your interest and support! 🌟