DRAG

Code for the paper "DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation" (ACL 2025 Main).

Motivation

RAG methods have proven effective for tasks requiring factual consistency and robust knowledge retrieval. However, large-scale RAG systems are prone to generating "hallucinated" content. This repo provides the code to run DRAG, a novel framework for distilling RAG knowledge from large-scale Language Models (LLMs) into small LMs (SLMs). Our approach leverages evidence- and knowledge graph–based distillation, ensuring that the distilled model retains critical factual knowledge while significantly reducing model size and computational cost. By aligning the smaller model's predictions with a structured knowledge graph and ranked evidence, DRAG effectively mitigates hallucinations and improves factual accuracy. Experimental evaluations on multiple benchmarks demonstrate that our method outperforms the prior competitive RAG methods like MiniRAG for SLMs by up to 27.7% using the same models, preserving high-level efficiency and reliability.

Getting Started

git clone https://github.com/VILA-Lab/DRAG.git
cd DRAG
npm install

Usage

Create a .env file containing the private keys for all the LLMs that will be utilized for evidence and graph generation.

GROQ_KEY='abc'
OPENAI_KEY='def'
GEMINI_KEY='ghi'
CLAUDE_KEY='jkl'

Change the following parameters in the language_model.py file:

The desired model names for each LLM (modify the class definitions)
The MAX_RETRIES variable based on the intended number of max retries for calling APIs

Run the following command in the terminal to execute the graph and evidence generation pipeline:

python 0_generate_all_context.py <llm-provider> <benchmark> <num-to-generate> [options]

Argument / Option	Description
`<llm-provider>`	(Required) Name of the large LLM to be used for evidence/graph generation
`<benchmark>`	(Required) Name of benchmark used for evaluation
`<num-to-generate>`	(Required) Number of evidences and graph relationships to generate
`--multithread`	Enable multithreading

Supported llm-provider options:

Supported benchmark options:

Verify that the output files contain the generated evidences/graphs:

evidences_{llm-provider}_{benchmark}.csv (source code 1_generate_evidences.py): Contains the output evidences for each question in the specified benchmark
evidences_final_{llm-provider}_{benchmark}.csv (source code 2_generate_evidence_rankings.py): Contains the evidences with their relevance order based on LLM ranking, semantic ranking, and combined (LLM + semantic) ranking
graph_{llm-provider}_{benchmark}.csv (source code 3_generate_graph.py): Generates graph relationships for each question in the specified benchamrk using the previously generated evidences
graph_final_{llm-provider}_{benchmark}.csv (source code 4_generate_graph_rankings.py): Contains graph relationships with their relevance order based on LLM ranking, semantic ranking, and combined (LLM + semantic) ranking

Optionally, run 5_generate_responses_no_context.py to generate the responses for the small LLM without evidence/graph context, and run 6_generate_responses.py to generate the responses with evidence and/or graph context. Change the model versions in language_model.py to reflect the intended SLMs before running these scripts.

NOTE: In our paper, we used Harness for response generation; this framework also provides evaluation.

Contribution

We welcome contributions - please feel open an issue, or a pull request, if you have any suggestions/improvements.

Citation

@misc{chen2025dragdistillingragslms,
      title={DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation}, 
      author={Jennifer Chen and Aidar Myrzakhan and Yaxin Luo and Hassaan Muhammad Khan and Sondos Mahmoud Bsharat and Zhiqiang Shen},
      year={2025},
      eprint={2506.01954},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.01954}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DRAG

Table of Contents

Motivation

Getting Started

Usage

Contribution

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmark_qs		benchmark_qs
0_generate_all_context.py		0_generate_all_context.py
1_generate_evidences.py		1_generate_evidences.py
2_generate_evidence_rankings.py		2_generate_evidence_rankings.py
3_generate_graph.py		3_generate_graph.py
4_generate_graph_rankings.py		4_generate_graph_rankings.py
5_generate_responses_no_context.py		5_generate_responses_no_context.py
6_generate_responses.py		6_generate_responses.py
README.md		README.md
language_model.py		language_model.py
relationship_graph.py		relationship_graph.py
semantic_distance_calculator.py		semantic_distance_calculator.py
utils.py		utils.py

VILA-Lab/DRAG

Folders and files

Latest commit

History

Repository files navigation

DRAG

Table of Contents

Motivation

Getting Started

Usage

Contribution

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages