Copyright (c) 2023-2024 Antmicro
This project provides a Zephyr library for the Kenning runtime API, along with an application for model evaluation. Its aim is to simplify adoption and switching between existing runtime implementations.
This repository provides:
kenning_inference_lib
- a Zephyr library providing generic wrapper methods for loading models and running inference, regardless of their underlying implementation.kenning-zephyr-runtime
app - a Zephyr application used by Kenning for evaluating models and runtimes on devices.- demo application (
demo_app
) - a Zephyr application that useskenning_inference_lib
to run gesture recognition on sample data.
This is the minimal set of steps to build the runtime and run demo application or Kenning inference server.
The easiest way to obtain environment with all dependences is to use prepared Docker image
mkdir zephyr-workspace && cd zephyr-workspace
docker run --rm -it -v $(pwd):$(pwd) -w $(pwd) ghcr.io/antmicro/kenning-zephyr-runtime:latest /bin/bash
Now clone this repository and install the latest Zephyr SDK
git clone https://github.com/antmicro/kenning-zephyr-runtime
cd kenning-zephyr-runtime/
./scripts/prepare_zephyr_env.sh
./scripts/prepare_modules.sh
source .venv/bin/activate
At this point you should be able to build the demo app and run it
west build -p always -b stm32f746g_disco demo_app -- -DEXTRA_CONF_FILE=tvm.conf
west build -t board-repl
python ./scripts/run_renode.py
The output should look similar as in demo app section.
To build Kenning inference server app run
west build -p always -b stm32f746g_disco app -- -DEXTRA_CONF_FILE=tvm.conf
And then execute Kenning to compile the model, run benchmark and generate report
kenning optimize test report
--json-cfg ./kenning-scenarios/renode-zephyr-tvm-magic-wand-inference.json
--measurements ./results-tvm.json
--report-path ./report-tvm.md
--report-types performance classification renode_stats
--to-html
--verbosity INFO
The report will be saved as report-tvm/report-tvm.html
.
This section contains instructions for preparing Zephyr and building the runtime.
The Docker environment with all the necessary components is available in Dockerfile. The built image can be pulled with:
docker pull ghcr.io/antmicro/kenning-zephyr-runtime:latest
or you can build the image with
docker build -t kenning-zephyr-runtime:local .
To be able to build and use the project, you need the folowing dependencies:
- Zephyr dependencies
jq
curl
west
patch
CMake
On Debian-based Linux distributions, install the dependencies as follows:
sudo apt update
sudo apt install -y --no-install-recommends ccache curl device-tree-compiler dfu-util file \
g++-multilib gcc gcc-multilib git jq libmagic1 libsdl2-dev make ninja-build \
python3-dev python3-pip python3-setuptools python3-tk python3-wheel python3-venv \
mono-complete wget xxd xz-utils patch
First off, create a workspace directory and clone the repository:
mkdir zephyr-workspace && cd zephyr-workspace
git clone https://github.com/antmicro/kenning-zephyr-runtime.git
cd kenning-zephyr-runtime
After entering the project's directory, initialize a Zephyr workspace with:
./scripts/prepare_zephyr_env.sh
source .venv/bin/activate
This will:
- Download (if necessary) and set up the Zephyr SDK
- Download necessary toolchains
- Set up a Python virtual environment with necessary dependencies.
This can be reused to load the necessary environment before launching commands mentioned later in this README.
Now, prepare additional modules:
./scripts/prepare_modules.sh
To build the Kenning Zephyr runtime, select a supported machine learning runtime and a board.
west build --board <board> app -- -DEXTRA_CONF_FILE=<runtime>.conf
You can provide one of the following runtimes in <runtime>
:
The project was tested on the following boards:
Check the Adding support for more boards section for information on whow to add a new target device.
The binary built after executing west build
can be found in build/zephyr/zephyr.elf
.
Use pip
to install Kenning with Renode support enabled:
pip install --upgrade pip
pip install "kenning[tvm,tensorflow,reports,renode] @ git+https://github.com/antmicro/kenning.git"
The pyrenode3 module requires installing Renode to work.
The easiest way is to use the latest Renode package and store its location in PYRENODE_PKG
:
wget https://builds.renode.io/renode-latest.pkg.tar.xz
export PYRENODE_PKG=`pwd`/renode-latest.pkg.tar.xz
For other configuration options check pyrenode3 README.md.
Kenning provides:
- Model optimization and compilation
- Evaluation of a model on target device:
- Sending the model to the device using UART communication (e.g. execution graph or TFLite Flatbuffer)
- Sending input data for running inference on the model
- Collecting output data from the model, and evaluating the quality and performance of the model on target device with selected runtime
- Report rendering, including comparison reports that allow to compare various runtimes, boards, models and applied optimizations.
With Kenning, we can also evaluate the runtime by simulating the device in Renode. This allows us to:
- Verify model behavior without the need for physical hardware
- Check model and runtime performance and correctness in Continuous Integration pipelines without the actual device in the loop
- Check model and runtime performance on platforms under development
- Obtain more detailed metrics regarding device usage, e.g. histogram of instructions
The switch between Renode and actual hardware is seamless - both communicate with Kenning using UART.
This section will demonstrate how to build the project and evaluate a model for recognizing gestures on stm32f746g_disco
.
First off, build the kenning-zephyr-runtime
app for stm32f746g_disco
and the TFLite Micro configuration:
west build -p always -b stm32f746g_disco app -- -DEXTRA_CONF_FILE=tflite.conf
Then, evaluate the model in Renode using a sample scenario located in kenning-scenarios/renode-zephyr-tflite-magic-wand-inference.json
and generate a report with performance and quality metrics:
kenning optimize test report \
--json-cfg kenning-scenarios/renode-zephyr-tflite-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-path reports/stm32-renode-tflite-magic-wand/report.md \
--to-html \
--verbosity INFO
The model performance report in Markdown will be available under reports/stm32-renode-tflite-magic-wand/report.md
.
The HTML version of the report will be accessible from reports/stm32-renode-tflite-magic-wand/report/report.html
.
To build the kenning-zephyr-runtime
app to work with microTVM runtime, set -DEXTRA_CONF_FILE
to tvm.conf
, e.g. by executing:
west build -p always -b stm32f746g_disco app -- -DEXTRA_CONF_FILE=tvm.conf
Evaluate the model using the sample scenario located in kenning-scenarios/renode-zephyr-tvm-magic-wand-inference.json
:
kenning optimize test report \
--json-cfg kenning-scenarios/renode-zephyr-tvm-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-path reports/stm32-renode-tvm-magic-wand/report.md \
--to-html \
--verbosity INFO
This step requires Kenning to be installed. Follow the steps in Installing Kenning with Renode to install it.
The microTVM backend requires having TVM ops used by model to be compiled with the runtime.
By default, it is compiled with Magic Wand model ops, but it is possible to use ops from any model.
To do so, provide additional config variable CONFIG_KENNING_MODEL_PATH
which should contain path to the model.
This path can be either path to the file or URL to any model hosted online, for example at https://dl.antmicro.com/kenning/ (i.e. https://dl.antmicro.com/kenning/models/classification/magic_wand.h5).
The supported model formats are:
- ONNX (.onnx),
- Keras (.h5),
- PyTorch (.pt, .pth),
- TFLite (.tflite).
You can set this variable in prj.conf
or add it to west build
as follows (remember to wrap path in \"
):
west build -p always -b stm32f746g_disco app -- \
-DEXTRA_CONF_FILE=tvm.conf \
-DCONFIG_KENNING_MODEL_PATH=\"https://dl.antmicro.com/kenning/models/classification/magic_wand.h5\"
Kenning Zephyr Runtime uses LLEXT to support hot-swapping ML runtimes. The runtime can be built separately from the project and loaded into an already running KZR.
Build kenning-zephyr-runtime
with LLEXT support using:
west build -p always -b stm32f746g_disco app -- -DEXTRA_CONF_FILE=llext.conf
then build the TVM extension:
west build app -t llext-tvm -- -DEXTRA_CONF_FILE="llext.conf;llext_tvm.conf"
Evaluate the model using scenario located in kenning-scenarios/renode-zephyr-tvm-llext-magic-wand-inference.json
:
kenning optimize test report \
--json-cfg kenning-scenarios/renode-zephyr-tvm-llext-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-path reports/stm32-renode-tvm-llext-magic-wand/report.md \
--to-html \
--verbosity INFO
Alternatively, build and evaluation can be done in a single step:
kenning optimize test report \
--json-cfg kenning-scenarios/renode-zephyr-auto-tvm-llext-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-path reports/stm32-renode-auto-tvm-llext-magic-wand/report.md \
--to-html \
--verbosity INFO
Kenning can evaluate the runtime running on a physical device.
To do so, we need to flash the device and replace RenodeRuntime
in evaluation scenarios for Kenning with proper runtimes.
Build the runtime for nrf52840dongle
(let's use TFLite Micro in this example):
west build -p always -b nrf52840dongle app -- -DEXTRA_CONF_FILE=tflite.conf
Flash Kenning runtime on the device by following instructions in the Zephyr documenation.
Finally, evaluate the model and generate a report with performance and quality metrics:
kenning optimize test report \
--json-cfg kenning-scenarios/zephyr-tflite-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-types performance classification \
--report-path reports/nrf-tflite-magic-wand/report.md \
--to-html \
--verbosity INFO
Build the runtime for stm32f746g_disco
(let's use TFLite Micro in this example):
west build -p always -b stm32f746g_disco app -- -DEXTRA_CONF_FILE=tvm.conf
Flash the connected device with the kenning-zephyr-runtime
app:
west flash
Evaluate the model and generate a report with performance and quality metrics:
kenning optimize test report \
--json-cfg kenning-scenarios/zephyr-tvm-magic-wand-inference.json \
--measurements results.json --verbosity INFO \
--report-types performance classification \
--report-path reports/stm32-tvm-magic-wand/report.md \
--to-html \
--verbosity INFO
The Kenning inference library present in this repository can be also used in actual applications, not only in the evaluation process in Kenning.
The application present in demo_app
demonstrates how to use Kenning Zephyr Runtime in actual, simple use case, where we take a model recognizing gestures (wing
, ring
, slope
and negative
, trained with Magic Wand dataset) and compile it with picked runtime.
It goes through delivered inputs, runs inference and prints the output.
With the build environment configured as described in the Cloning the project and preparing the environment, you can build the demo_app
as follows:
- using the microTVM runtime:
west build -p always -b hifive_unleashed demo_app -- -DEXTRA_CONF_FILE=tvm.conf
- using the TFLite Micro runtime:
west build -p always -b hifive_unleashed demo_app -- -DEXTRA_CONF_FILE=tflite.conf
After building the application with a board specified, we can either flash the hardware with it, or simulate it in Renode.
To simulate it in Renode, generate the board's repl
platform file using:
west build -t board-repl
The result can be found under ./build/<board_name>.repl
.
Finally, run the demo with:
python ./scripts/run_renode.py
The output should look like this:
Starting Renode simulation. Press CTRL+C to exit.
*** Booting Zephyr OS build zephyr-v3.5.0-5385-g415cb65e3f48 ***
__nop function is not yet supported.I: model output: [wing: 1.000000, ring: 0.000000, slope: 0.000000, negative: 0.000000]
I: model output: [wing: 0.000000, ring: 0.000000, slope: 0.000000, negative: 1.000000]
I: model output: [wing: 0.000000, ring: 0.000000, slope: 1.000000, negative: 0.000000]
I: model output: [wing: 1.000000, ring: 0.000000, slope: 0.000000, negative: 0.000000]
I: model output: [wing: 0.000000, ring: 0.997457, slope: 0.000000, negative: 0.002543]
I: model output: [wing: 0.000000, ring: 0.000000, slope: 1.000000, negative: 0.000000]
I: model output: [wing: 1.000000, ring: 0.000000, slope: 0.000000, negative: 0.000000]
I: model output: [wing: 1.000000, ring: 0.000000, slope: 0.000000, negative: 0.000000]
I: model output: [wing: 1.000000, ring: 0.000000, slope: 0.000000, negative: 0.000000]
I: model output: [wing: 0.000000, ring: 0.000000, slope: 1.000000, negative: 0.000000]
I: model output: [wing: 0.000000, ring: 0.000000, slope: 0.000000, negative: 1.000000]
I: inference done
It is also possible to build demo_app
using some custom model.
To do it, you need to edit model_struct
in demo_app/src/main.c
to match the model IO specification.
Then, provide model input in demo_app/src/input_data.h
and model path using CONFIG_KENNING_MODEL_PATH
config variable (similarly as in Building runtime with microTVM backend using custom model):
west build -p always -b stm32f746g_disco demo_app -- \
-DEXTRA_CONF_FILE=tvm.conf \
-DCONFIG_KENNING_MODEL_PATH=\"https://dl.antmicro.com/kenning/models/classification/magic_wand.h5\"
Adapting kenning-zephyr-runtime
for new boards is straightforward.
As long as the underlying runtime implementation supports a given board without additional configuration, the process of adapting the application for new board boils down to picking an UART for communication with the Kenning application running on host.
Such UART is expected to be aliased kcomms
in the application.
The alias can be set in the overlay file under app/boards/<board_name>.overlay
, where <board_name>
is the name of the board in Zephyr, passed in --board
flag in west build
:
/ {
aliases {
kcomms = &uart0;
};
};
It is crucial that the selected UART isn't used anywhere else (e.g. as zephyr,console
).
Some boards may also require additional configuration.
Those should be placed at app/boards/<board_name>.conf
.