This is the official codebase for Context-Informed Grounding Supervision (CINGS). It includes the training and inference code for reproducing our main experiments.
Install dependencies using:
pip install -r requirements.txt
We use a filtered version of the Self-RAG dataset for training.
Model checkpoints will be released soon. Stay tuned!
To train using LLaMA 3 8B as the base model:
torchrun --nnodes 1 --master_port=29100 --nproc_per_node 8 train.py \
--enable_fsdp --low_cpu_fsdp \
--training_argument configs/training_configs/llama3_train.json \
--model_name meta-llama/Llama-3.1-8B \
--token_name meta-llama/Llama-3.1-8B-Instruct \
--num_epochs 3 \
--dataset llava_llama3_selfrag_single_dataset \
--dist_checkpoint_folder llama3_basemodel \`c
--batch_size_training 128 \
--micro_batch_size 16 \
--loss_mask_context context \
--model_use_peft
Argument descriptions:
-
--training_argument
: Select config with the base model name. Check files underconfigs/training_configs
. -
--model_name
: Base model to fine-tune. -
--token_name
: Tokenizer name (Instruct version for chat template compatibility). -
--dataset
: Training dataset (aligned with your base model). Check the list of datasets underconfigs/datasets_dpr.py
. -
--dist_checkpoint_folder
: Folder to save checkpoints. -
--loss_mask_context
: Choose from:no_context
β standard instruction tuningcontext
β CINGS (ours)no_mask
β CINGS without context masking
-
--model_use_peft
: Use LoRA for parameter-efficient fine-tuning (remove to train all parameters).
After training the language model, use the official LLaVA repo for vision-language alignment.
Update the following scripts:
scripts/pretrain.sh
scripts/finetune.sh
Replace model_name_or_path
with the checkpoint folder from the text-only training step (dist_checkpoint_folder
).
CUDA_VISIBLE_DEVICES=0 accelerate launch inference.py \
--training_argument {training_argument}.json \
--dataset {dataset} \
--dist_checkpoint_folder {dist_checkpoint_folder} \
--val_batch_size 1 \
--add_docs \
--model_use_peft
Use the same arguments as training. Only --dataset
should be updated to point to your evaluation dataset (see configs/datasets.py
for available options).
We follow the evaluation process from the official LLaVA repo. See the evaluation guide for details.
If you use this work, please cite:
@misc{lee2025contextinformedgroundingsupervision,
title={Context-Informed Grounding Supervision},
author={Hyunji Lee and Seunghyun Yoon and Yunjae Won and Hanseok Oh and Geewook Kim and Trung Bui and Franck Dernoncourt and Elias Stengel-Eskin and Mohit Bansal and Minjoon Seo},
year={2025},
eprint={2506.15480},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.15480}
}
This repository builds on Metaβs LLaMA Recipes. We are grateful to the community and all contributors.