Skip to content

A Systematic Study of the Role of Data Quality and Alignment for Fine-tuning LLMs for Enhanced Autoformalization.

License

Notifications You must be signed in to change notification settings

brando90/data-quality-alignment-sft-af

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

A Systematic Study of the Role of Data Quality and Alignment for Fine-tuning LLMs for Enhanced Autoformalization

Paper link here: A Systematic Study of the Role of Data Quality and Alignment for Fine-tuning LLMs for Enhanced Autoformalization - Krrish Chawla, Aryan Sahai, Mario DePavia, Brando Miranda

Please cite us if you find our work useful:

@article{,
   abstract = {This study explores the role of data quality, particularly alignment, in fine-tuning Large Language Models (LLMs) for the task of autoformalization. Contrary to the conventional emphasis on dataset size, our research highlights the importance of data alignment-the similarity between training data and target domain. Through our experiments, we demonstrate a negative correlation between data alignment and model perplexity loss. These findings suggest a re-evaluation of LLM training approaches, emphasizing quality and relevance over quantity, especially in specialized applications such as autoformalization.},
   author = {Krrish Chawla and Mario Depavia and Aryan Sahai and Brando Miranda},
   title = {A SYSTEMATIC STUDY OF THE ROLE OF DATA QUALITY AND ALIGNMENT FOR FINE-TUNING LLMS FOR ENHANCED AUTO-FORMALIZATION},
}

Appendix:

Old Research Journal

About

A Systematic Study of the Role of Data Quality and Alignment for Fine-tuning LLMs for Enhanced Autoformalization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published