The aim of this laboratory is to demonstrate the pruning and clustering of deep learning models. For this task, we will use TensorFlow Model Optimization Toolkit and Neural Network Intelligence (NNI) Toolkit.
-
Go over the pruning_clustering_experiments script and check what it does (especially check the
optimize_model
method that prepares structures for model optimization and fine-tuning). -
In the end of the afore-mentioned
optimize_model
function there is acompress_and_fine_tune
method that will need to be completed in the following tasks - it will handle model optimization and fine-tuning -
[2pt]
Finish the implementation of theTFMOTOptimizedModel
class:-
This class is supposed to firstly optimize and fine-tune the Keras model using TensorFlow Optimization Toolkit, and then compile it to TensorFlow Lite in FP32 mode for inference purposes.
-
Implement the
prepare_model
,preprocess_input
,run_inference
andpostprocess_outputs
methods.NOTE:
If you completed thel02_quantization
tasks, just set the parent class of theTFMOTOptimizedModel
toFP32Model
fromdl_in_iot_course.l02_quantization.quantization_experiments
- all methods for preparing a model, running inference and processing inputs and outputs will be reused.
-
-
[4pt]
Finish the implementation of the model clustering inClusteredModel
class:- Load the Keras model.
- Create a clustered model - use a proper
tfmot
method for adding clustering to model, useself.num_clusters
to set the number of clusters, use linear centroid initialization. - Fine-tune the model - compile and fit the model using
self.optimizer
,self.loss
,self.metrics
,self.traindataset
,self.epochs
,self.validdataset
objects created inoptimize_model
method. - Convert the model to FP32 TFLite model and save to
self.modelpath
.
-
[4pt]
Finish the implementation of the model clustering inPrunedModel
class:- The
compress_and_fine_tune
method comes with number of epochs and a simple pruning schedule. - Load the Keras model.
- Create a pruned model - use the
prune_low_magnitude
pruning along with thesched
schedule. - Fine-tune the model - compile and fit the model using
self.optimizer
,self.loss
,self.metrics
,self.traindataset
,self.epochs
,self.validdataset
objects created inoptimize_model
method. - Remember to set up proper callbacks for pruning.
- The
-
In the main script, uncomment all already supported classes and run it (it will definitely take some time):
python3 -m dl_in_iot_course.l03_pruning_clustering.pruning_clustering_experiments \ --model-path models/pet-dataset-tensorflow.h5 \ --dataset-root build/pet-dataset/ \ --results-path build/results
In the
build/results
directory, the script will create:-
<prefix>-metrics.md
file - contains basic metrics, such as accuracy, precision, sensitivity or G-Mean, along with inference time -
<prefix>-confusion-matrix.png
file - contains visualization of confusion matrix for the model evaluation. Those files will be created for: -
clustered-<num_clusters>-fp32
- the clustered model withnum_clusters
clusters. -
pruned-<sparsity>-fp32
- the pruned model.
NOTE:
To download the dataset, add--download-dataset
flag. The dataset downloaded from the previous task can be reused.NOTE:
If the evaluation takes too long, reduce the test dataset size by setting--test-dataset-fraction
to some lower value, but inform about this in the Summary note. -
-
Write a small summary for experiments containing:
-
Performance and quality data for each experiment:
- The computed metrics,
- Confusion matrix.
-
[2pt]
Write the deflation percentage along with the size of the compressed models with ZIP tool. In the directory containing.tflite
files, you can run (check sharkdp/fd for details):fd -t f -e tflite -x zip {.}.zip {}
NOTE:
in Singularity/Docker environments thefd
tool is namedfdfind
. -
Answers for the questions:
[1pt]
In their best variants, how do quantization, pruning and clustering compare to each other (both performance- and quality-wise)?[1pt]
How do the sizes of compressed models compare to the size of the TFLite FP32 model (how many times is each of the models smaller than the uncompressed FP32 solution).[1pt]
What does deflation means in ZIP tool and how does it correspond to the observed model optimizations?[1pt]
Which of the compression methods gives the smallest and best-performing model (is there a dominating solution on both factors)?
The summary should be put in the project's
summaries
directory - follow the README.md in this directory for details. -
NOTE:
each of the required code snippets should take around 30 lines, but in general less
Additional factors:
[2pt]
Git history quality
In this task we will work on a much simpler model trained for Fashion MNIST dataset. Since NNI supports structured pruning mainly in PyTorch, we will switch to this framework for this task.
-
Read the structured_pruning script thoroughly.
-
NOTE:
You DO NOT NEED to train the model, use the existing modelfashion-mnist-classifier.pth
frommodels
directory. -
[1pt] Create a traced optimizer (check
nni.trace
method), use Adam optimizer withTRAINING_LEARNING_RATE
(as in training optimizer). -
[1pt] Formulate the configuration list for the ActivationAPoZRank pruner - we want to prune both 2D convolutions and linear layers.
-
[1pt] What is more, NNI by default prunes ALL layers of given type, even the output ones - EXCLUDE the final linear layer from pruning schedule.
-
[1pt] Define
ActivationAPoZRank
pruner using themodel
, defined configuration list,trainer
method, traced optimizer and criterion. Settraining_batches
to 1000. I highly recommend using legacy version of the pruner (documented in ActivationAPoZRank v2.8 -
[1pt] The
pruner.compress()
method will compute the pruning mask, and the additional prints will show the pruning status - Please include logs from terminal for those printouts -
[2pt] Speedup the model using
ModelSpeedup.speedup_model
function. The expected dummy input for the network should have 1x1x28x28 shape (usetorch.randn(shape).to(model.device)
). -
[3pt] Fine-tune the model:
- Define the optimizer (use Adam with
FINETUNE_LEARNING_RATE
) - Train the model using
model.train_model
forFINE_TUNE_EPOCHS
epochs.
- Define the optimizer (use Adam with
-
[2pt] Implement
convert_to_onnx
method to save the model to ONNX format. -
[2pt] Implement
convert_onnx_to_tflite
method to convert the ONNX model to FP32 TensorFlow Lite model. -
[1pt] Use Netron tool to visualize the network
-
In summary:
- [1pt] Include the shape of the model before pruning (script logs provide those, as well as other data), along with its accuracy and inference time
- [1pt] Include the pruning logs collected around
pruning.compress()
- [1pt] Include the shape of the model after pruning, along with its accuracy and inference time before fine-tuning
- [1pt] Include the accuracy and inference time of the model after fine-tuning
- [1pt] Include the fine-tuned PyTorch model in the summary directory (use Git LFS, check
git lfs install
,git lfs track
commands, you need to apply them before adding the file and committing it) - [1pt] Include the ONNX file with pruned PyTorch model (use Git LFS)
- [1pt] Include the TFLite file with pruned PyTorch model (use Git LFS)
- [1pt] In the summary, include the visualization of the graph using Netron
The command should be executed more or less as follows:
python3 -m dl_in_iot_course.l03_pruning_clustering.structured_pruning_experiments \
--input-model models/fashion-mnist-classifier.pth \
--backup-model backup-model.pth \
--final-model final-model.pth \
--dataset-path fashion-dataset \
--onnx-model model.onnx \
--tflite-model model.tflite
Additional factors:
[2pt]
Git history quality