Deep neural networks have proved successful in both industry and research in recent years. It’s enormous success can be attributed to its capacity to encode vast amounts of data and manipulate billions of model parameters. However, not only because of the high computational complexity, but also because of the massive storage needs, it is difficult to install these cumbersome deep models on devices with low capabilities, such as embedded devices and mobile phones. To address the latency, accuracy, and computational needs at the inference time, different model compression techniques like model quantization and pruning were proposed. One such model compression method widely used and researched is Knowledge Distillation (KD). In this project, we combine knowledge distillation using Neuron Selectivity Transfer(NST) [4] with teaching assistant technique [3] to improve the performance of a student model that is around 10 times smaller than the teacher. We compare the performance of this approach to the performance of a model learned from scratch and NST-KD (explained in section 3), and show that our model achieves better accuracy.
-
Notifications
You must be signed in to change notification settings - Fork 0
sreeja-g/Knowledge-distillation-via-Neuron-selectivity-transfer---Teaching-Assistant
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published