We show machine learning model deployment on MT3620 Azure Sphere using Apache TVM. We show multiple deployments from a simple a + b
example to a Conv2D
operation and finally we deploy Keyword Spotting model developed by ARM.
- Linux machine
- MT3620 Azure Sphere board
- micro USB cable
- MT3620 Ethernet Shield (only for tuning)
- Linux with Azure Sphere SDK (follow Azure Sphere documentation to setup SDK and device)
- Python 3.6+
- Tensorflow
- Clone this repository (use
git clone --recursive
to clone submodules) - Install TVM
- NOTE: Ensure you enable LLVM by setting
set(USE_LLVM ON)
. (This repository has been tested against LLVM-10) - NOTE: Checkout
f5b02fdb1b5a7b6be79df97035ec1c3b80e3c665
before installation.
- NOTE: Ensure you enable LLVM by setting
- Setup virtual env
$ python3 -mvenv _venv
$ . _venv/bin/activate
$ pip3 install -r requirements.txt -c constraints.txt
$ export PYTHONPATH=$(pwd)/python:$PYTHONPATH
$ export PYTHONPATH=$(pwd)/3rdparty/ML_KWS:$PYTHONPATH
- Connect Azure Sphere board to PC with micro USB cable.
- In the current directory run
make connect
to connect to device. (This requiressudo
access) - Enable evelopment by running
make enable_development
command. - Optional: Follow this to enable network capability:
- Disconnect the device and attach the network shield.
- Setup static IP
Address 192.168.0.10 Netmask 24 Gateway 192.168.0.1
The basic sample is a + b
operation. In this example, we deploy a simple operation on Azure Sphere using C Runtime from TVM. To deploy this follow these instructions:
$ make delete_a7
$ make cleanall
$ make test
$ make program
After programming the Azure Sphere, it reads TVM graph and parameters from FLASH and creates the runtime. Then it will read input data from FLASH, pass it to the TVM Relay model and finally compares the output with expected output from X86 machine. If the result maches, LED1 on the Azure Sphere would change to green.
Next sample is Conv2D
operation. To run this example, follow previous instructions and use conv2d
instead of test
. If you want to use network capabilities, use -DAS_NETWORKING=1
. Make sure to follow previous instruction on conecting Ethernet shield to Azure Sphere and setup the network.
Azure Sphere provides debugging capabilities over the micro USB connection with no extra hardware requirements. To use debugger open Visual Studio Code in current directory and follow instructions. To enable debugging in samples follow these:
- Build the sample with these option:
-DAS_DEBUG=1
or change them in config file. - Use Start Debugging option in VsCode and look for the output window.
We deploy KWS, a tensorflow model developed by ARM, on Azure Sphere Cortex-A7 core using TVM. To enable this, we need to follow several steps as I explain in following. But to see the final deployment quickly, run these commands to deploy KWS model on Azure Sphere. In this deployment, we use a relay quantized KWS DS-CNN model. We build this model in TVM along with one of the WAV files in samples as input data. Then we run this model on Azure Sphere and compare the TVM output with expected result from X86. If the result matches, we see a green LED on the board.
$ make delete_a7
$ make cleanall
$ make kws
$ make program
In following subsection, we explain how we achieve this deployment in more details.
KWS models are originally developed in Tensorflow. Here we focus on DS-CNN pre-trained models provided by ARM. To import the model and perform Relay quantization, run this command. This will save the relay module as a pickle file which we can use to build the runtime.
python3 -m model.kws.kws --export --quantize --global-scale 4.0 -o build
Here is the output:
INFO: Quantizing...
INFO: Global Scale: 4.0
INFO: build/module_gs_4.0.pickle saved!
To test the accuracy of the quantized model run the following. This will load the Relay module and run 1000
audio samples from KWS dataset and shows the accuracy.
$ python3 -m model.kws.kws --test 1000 --module build/module_gs_4.0.pickle
This task will take few minutes the first time because of downloading the dataset. Here is the output:
INFO: testing 1000 samples
Accuracy for 1000 samples: 0.907
Now, we can build TVM runtime graph using this module. This command uses the saved quantized model to build runtime graph with no tuning.
$ python3 -m build_model --keyword --module build/module_gs_4.0.pickle -o build
Here is the output:
INFO: keyword_model.o saved!
INFO: keyword_graph.bin saved!
INFO: keyword_graph.json saved!
INFO: keyword_params.bin saved!
...
INFO: sample audio file used: python/model/kws/samples/silence.wav
INFO: keyword_data.bin saved!
INFO: keyword_output.bin saved!
We deployed an end-to-end demo of Keyword Spotting model on Azure Sphere. We implemented audio pre-processing and microhpnone interface on Cortex-M4 as a partner application and TVM on Cortex-A7.
-
Connect a Microphone with analog interface to Azure Sphere ADC interface (we used MAX4466). Follow instruction from the partner App.
- NOTE: if you don't have a microphone, you can deploy
DEMO1
from partner app which reads pre-recorded data from memory.
- NOTE: if you don't have a microphone, you can deploy
-
Follow the steps in apps/kws_mic/README.md to deploy partner app on Cortex-M4. You can choose
DEMO1
(pre-loaded .wav file) orDEMO2
(recorded live from microphone). -
Deploy the TVM runtime application on Cortex-A7:
make cleanall make kws_demo make program
-
If you push
button B
, it acquires one second speech from microphone and shows the result label on four LEDs. Here are the LED colors for each label.Label Yes No Up Down Left Right LEDs ⚫⚫💚💚 ⚫⚫🔴🔴 ⚫⚫💚⚫ ⚫⚫🔵⚫ ⚪⚫⚫⚫ ⚫⚫⚫⚪ Label On Off Stop Go Silence Unknown LEDs ⚪⚫⚪⚪ 🔴⚫⚫⚫ 🔴⚫🔴🔴 💚⚫💚💚 ⚫⚫⚫🔵 ⚫⚫⚫💚
Here are some of the references used in this project: