-
Notifications
You must be signed in to change notification settings - Fork 10
Preparing Models
golololologol edited this page Jul 22, 2024
·
3 revisions
Due to the nature of knowledge distillation, only models with the same vocabulary family are compatible for distillation, see more info in: Vocabulary Families
The models you pick as the teachers and student must be under the same vocab family, to easily check which models you can distill, take a look at this tool specifically made for that purpose: LLM info lookup
Once you've settled on the teachers and the student you want to use, download either the full FP16, or exl quants of the teachers to collect the data from them. Student must be a full FP16 pytorch model!
Next, place all the teachers in one folder, and specify it in the config under the correct field. Or you can specify the path to just one teacher under that same field.