Training

Environment setup

Go into the Training directory.
Create an python environment and install the requirements file.
Copy the tinyml_contest_data_training folder into the directory.
Please remember to also get the submodule.

Training

simply run:
python training/train.py
(Please note that we are still in the Training directory, a subdirectory is called training )
A new folder saved_models is created where the model files are stored.
You need to select your optimal model.

We are using our QAT repository https://github.com/embedded-machine-learning/FastQATforPOTRescaler to get a power of two rescaler. It should be published as part of DSD23, however as this is not available yet https://jantsch.se/AxelJantsch/papers/2023/DanielSchnoell-DSD.pdf .

Weight export

Open the notebook netron_tens/nunpy_to_c_QAT.ipynb
You need to select the model in the first cell, then simply run the whole notebook and a new file wights_Quant.hpp is created. This file needs to be copied into the c Project.

Compile C Project+

Inside CProject open TESTMODEL in MDK5
Inside MDK5 project the following options need to be set:

Arm Compiler Version 6.18 (essential)
c compiler flags -Omin
Activate Link Time Optimization
Use Oz
Language C: gnu11
Language C++: gnu++17 (community)

Please double check these settings with the following images:

Updates since Contest

The original solution is in submission.zip
The end of the contest got a bit stressful, so bugs where introduced. They modified how the compiler interpreted the code, which lead to a suboptimal translation. The update removed these inaccuracies, while of course not modifying the results of the neural network.
The biggest difference is the usage of the template Layers rather than the classical C type, type dynamic function, which improves readability. By using constant weights, as originally intended and easily interpretable code the compiler can optimize better.

Program Size: 
Code=       9458B  →  9518B    (+60)    [Data-type change, two template instances rather than one function]
RO-data=    3446B  →  3530B    (+84)    [Moved data from RW to RO, and changed one datatype
RW-data=     236B  →    20B   (-216)       from int32 to int8.]
ZI-data=   10492B  → 10492B     (±0)
Latency=    1.52ms →  1.48ms (-0.04)    [no void pointers, fully known data-types, only
                                           compile-time dynamic, but runtime static.]

The code size change is actually very interesting. About 36 bytes come from the change of the data-type form the right shift (with the ugly functions). It now needs to load a quarter words, which does not natively exist (to the best of our knowledge). But the change of type reduces the size of the data by 66 bytes (3*22B). (This code overhead is also true for 16 bit integer even though a native half world load exists.)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
CProject/TESTMODEL		CProject/TESTMODEL
Training		Training
images		images
presentation_poster		presentation_poster
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
submission.zip		submission.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training

Environment setup

Training

Weight export

Compile C Project+

Updates since Contest

About

Releases

Packages

Contributors 2

Languages

License

embedded-machine-learning/ICCAD-TinyML2023-1st-Place

Folders and files

Latest commit

History

Repository files navigation

Training

Environment setup

Training

Weight export

Compile C Project+

Updates since Contest

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages