A repository containing Native C-code implementation of a convolutional neural network and multi-layer perceptron (MLP) models for integer-only inference. Model parameters are quantized to 8-bit integers, and floats are replaced with the fixed-point representation.
The repository contains:
- scripts for training model with PyTorch
- post-training quantization of model parameters to 8-bit integers,
- writing the relevant parameters in C
- interfacing the C code for integer-only inference via C-types.
The ideas presented in this tutorial were used to quantize and write an inference-only C code to deploy a deep reinforcement learning algorithm on a network interface card (NIC) in Tessler et al. 2021[1].
Quantization is based on Nvidia's pytorch-quantization, which is part of TensorRT.
https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization
pytorch-quantization allows for more sophisticated quantization methods than what is presented here. For more details, see Hao et al. 2020[2].
NOTE pytorch-quantization requires a GPU and will not work without it
The c-code is structured to have separate files for the MLP and ConvNet models.
C-code is located within the src
directory in which:
nn_math
- source and header files contain relevant mathematical functionsnn
- source and header files contain relevant layers to create the neural network modelsmlp
- source and header files contain the MLP architecture to run for inferenceconvnet
- source and header files contain the ConvNet architecture to run for inferencemlp_params
- source and header files are generated viascripts/create_mlp_c_params.py
and contains network weights, scale factors, and other relevant constants for the MLP modelconvnet_params
- source and header files are generated viascripts/create_convnet_c_params.py
and contains network weights, scale factors, and other relevant constants for the ConvNet model
The repository was tested using gcc.
To compile and generate a shared library that can be called from Python using c-types run the following commands:
gcc -Wall -fPIC -c mlp_params.c mlp.c nn_math.c nn.c
gcc -shared mlp_params.o mlp.o nn_math.o nn.o -o mlp.so
gcc -Wall -fPIC -c convnet_params.c convnet.c nn_math.c nn.c
gcc -shared convnet_params.o convnet.o nn_math.o nn.o -o convnet.so
src/train_mlp.py
andsrc/train_convnet.py
are used to train an MLP/ConvNet model using PyTorchsrc/quantize_with_package.py
is used to quantize the models using the pytorch-quantization packagesrc/create_mlp_c_params.py
andsrc/create_convnet_c_params.py
create the header and source C files with relevant constants (network parameters, scale factors, and more) required to run the C-code.src/test_mlp_c.py
andsrc/test_convnet_c.py
run inference on the models using C-types to interface the C-code files from Python
Training
Epoch: 1 - train loss: 0.35650 validation loss: 0.20097
Epoch: 2 - train loss: 0.14854 validation loss: 0.13693
Epoch: 3 - train loss: 0.10302 validation loss: 0.11963
Epoch: 4 - train loss: 0.07892 validation loss: 0.11841
Epoch: 5 - train loss: 0.06072 validation loss: 0.09850
Epoch: 6 - train loss: 0.04874 validation loss: 0.09466
Epoch: 7 - train loss: 0.04126 validation loss: 0.09458
Epoch: 8 - train loss: 0.03457 validation loss: 0.10938
Epoch: 9 - train loss: 0.02713 validation loss: 0.09077
Epoch: 10 - train loss: 0.02135 validation loss: 0.09448
Evaluating model on test data
Accuracy: 97.450%
Evaluating integer-only C model on test data
Accuracy: 97.27%
Training
Epoch: 1 - train loss: 0.37127 validation loss: 0.12948
Epoch: 2 - train loss: 0.09653 validation loss: 0.08608
Epoch: 3 - train loss: 0.07089 validation loss: 0.07480
Epoch: 4 - train loss: 0.05846 validation loss: 0.06347
Epoch: 5 - train loss: 0.05044 validation loss: 0.05909
Epoch: 6 - train loss: 0.04567 validation loss: 0.05466
Epoch: 7 - train loss: 0.04071 validation loss: 0.05099
Epoch: 8 - train loss: 0.03668 validation loss: 0.05336
Epoch: 9 - train loss: 0.03543 validation loss: 0.04965
Epoch: 10 - train loss: 0.03164 validation loss: 0.04883
Evaluate model on test data
Accuracy: 98.620%
Evaluating integer-only C model on test data
Accuracy: 98.58%
[1] Tessler, C., Shpigelman, Y., Dalal, G., Mandelbaum, A., Kazakov, D. H., Fuhrer, B., Chechik, G., & Mannor, S. (2021). Reinforcement Learning for Datacenter Congestion Control. http://arxiv.org/abs/2102.09337
[2] Wu, H., Judd, P., Zhang, X., Isaev, M., & Micikevicius, P. (2020). Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation. http://arxiv.org/abs/2004.09602