Skip to content

Preparing Models

golololologol edited this page Jul 22, 2024 · 3 revisions

Due to the nature of knowledge distillation, only models with the same vocabulary family are compatible for distillation, see more info in: Vocabulary Families

The models you pick as the teachers and student must be under the same vocab family, to easily check which models you can distill, take a look at this tool specifically made for that purpose: LLM info lookup

Once you've settled on the teachers and the student you want to use, download either the full FP16, or exl quants of the teachers to collect the data from them. Student must be a full FP16 pytorch model!

Next, place all the teachers in one folder, and specify it in the config under the correct field. Or you can specify the path to just one teacher under that same field.

Clone this wiki locally