Skip to content

kaistAI/CINGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

Context-INformed Grounding Supervision (CINGS)

This is the official codebase for Context-Informed Grounding Supervision (CINGS). It includes the training and inference code for reproducing our main experiments.


πŸ› οΈ Requirements

Install dependencies using:

pip install -r requirements.txt

πŸ“¦ Data & Checkpoints

πŸ”Ή Data

We use a filtered version of the Self-RAG dataset for training.

πŸ”Ή Checkpoints

Model checkpoints will be released soon. Stay tuned!


πŸš€ Training

πŸ“ Text-Only Domain

To train using LLaMA 3 8B as the base model:

torchrun --nnodes 1 --master_port=29100 --nproc_per_node 8 train.py \
  --enable_fsdp --low_cpu_fsdp \
  --training_argument configs/training_configs/llama3_train.json \
  --model_name meta-llama/Llama-3.1-8B \
  --token_name meta-llama/Llama-3.1-8B-Instruct \
  --num_epochs 3 \
  --dataset llava_llama3_selfrag_single_dataset \
  --dist_checkpoint_folder llama3_basemodel \`c
  --batch_size_training 128 \
  --micro_batch_size 16 \
  --loss_mask_context context \
  --model_use_peft

Argument descriptions:

  • --training_argument: Select config with the base model name. Check files under configs/training_configs.

  • --model_name: Base model to fine-tune.

  • --token_name: Tokenizer name (Instruct version for chat template compatibility).

  • --dataset: Training dataset (aligned with your base model). Check the list of datasets under configs/datasets_dpr.py.

  • --dist_checkpoint_folder: Folder to save checkpoints.

  • --loss_mask_context: Choose from:

    • no_context – standard instruction tuning
    • context – CINGS (ours)
    • no_mask – CINGS without context masking
  • --model_use_peft: Use LoRA for parameter-efficient fine-tuning (remove to train all parameters).


πŸ–ΌοΈ Vision-Language Domain

After training the language model, use the official LLaVA repo for vision-language alignment.

Update the following scripts:

  • scripts/pretrain.sh
  • scripts/finetune.sh

Replace model_name_or_path with the checkpoint folder from the text-only training step (dist_checkpoint_folder).


πŸ” Inference

πŸ“ Text-Only Domain

CUDA_VISIBLE_DEVICES=0 accelerate launch inference.py \
  --training_argument {training_argument}.json \
  --dataset {dataset} \
  --dist_checkpoint_folder {dist_checkpoint_folder} \
  --val_batch_size 1 \
  --add_docs \
  --model_use_peft

Use the same arguments as training. Only --dataset should be updated to point to your evaluation dataset (see configs/datasets.py for available options).


πŸ–ΌοΈ Vision-Language Domain

We follow the evaluation process from the official LLaVA repo. See the evaluation guide for details.


πŸ“– Citation

If you use this work, please cite:

@misc{lee2025contextinformedgroundingsupervision,
  title={Context-Informed Grounding Supervision},
  author={Hyunji Lee and Seunghyun Yoon and Yunjae Won and Hanseok Oh and Geewook Kim and Trung Bui and Franck Dernoncourt and Elias Stengel-Eskin and Mohit Bansal and Minjoon Seo},
  year={2025},
  eprint={2506.15480},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.15480}
}

πŸ™ Acknowledgements

This repository builds on Meta’s LLaMA Recipes. We are grateful to the community and all contributors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published