Copyright (C) 2021, Axis Communications AB, Lund, Sweden. All Rights Reserved.
This example focuses on the application of object detection on an Axis camera equipped with an Edge TPU, but can also be easily configured to run on CPU or ARTPEC-8 cameras (DLPU). A pretrained model [MobileNet SSD v2 (COCO)] is used to detect the location of 90 types of different objects. The model is downloaded through the Dockerfile from the google-coral repository. The detected objects are saved in /tmp folder for further usage.
The following instructions can be executed to simply run the example.
-
Compile the ACAP application:
docker build --build-arg ARCH=<ARCH> --build-arg CHIP=<CHIP> --tag obj_detect:1.0 . docker cp $(docker create obj_detect:1.0):/opt/app ./build
where the values are found:
- <CHIP> is the chip type. Supported values are
artpec8
,cpu
andedgetpu
. - <ARCH> is the architecture. Supported values are
armv7hf
(default) andaarch64
.
- <CHIP> is the chip type. Supported values are
-
Find the ACAP application
.eap
filebuild/object_detection_app_1_0_0_<ARCH>.eap
-
Install and start the ACAP application on your camera through the camera web GUI
-
SSH to the camera
-
View its log to see the ACAP application output:
tail -f /var/volatile/log/info.log | grep object_detection
The whole principle is similar to the vdo-larod. In this example, the original stream has a resolution of 1920x1080, while MobileNet SSD COCO requires an input size of 300x300, so we set up two different streams: one is for MobileNet model, another is used to crop a higher resolution jpg image.
There are two methods used to obtain a proper resolution. The chooseStreamResolution method is used to select the smallest stream and assign them into streamWidth and streamHeight.
unsigned int streamWidth = 0;
unsigned int streamHeight = 0;
chooseStreamResolution(args.width, args.height, &streamWidth, &streamHeight);
Then, the createImgProvider method is used to return an ImgProvider with the selected output format.
provider = createImgProvider(streamWidth, streamHeight, 2, VDO_FORMAT_YUV);
The original resolution args.raw_width
x args.raw_height
is used to crop a higher resolution image.
provider_raw = createImgProvider(rawWidth, rawHeight, 2, VDO_FORMAT_YUV);
Then similar with tensorflow-to-larod, the larod interface needs to be set up. The setupLarod method is used to create a conncection to larod and select the hardware to use the model.
int larodModelFd = -1;
larodConnection* conn = NULL;
larodModel* model = NULL;
setupLarod(chipString, larodModelFd, &conn, &model);
The createAndMapTmpFile method is used to create temporary files to store the input and output tensors.
char CONV_INP_FILE_PATTERN[] = "/tmp/larod.in.test-XXXXXX";
char CONV_OUT1_FILE_PATTERN[] = "/tmp/larod.out1.test-XXXXXX";
char CONV_OUT2_FILE_PATTERN[] = "/tmp/larod.out2.test-XXXXXX";
char CONV_OUT3_FILE_PATTERN[] = "/tmp/larod.out3.test-XXXXXX";
char CONV_OUT4_FILE_PATTERN[] = "/tmp/larod.out4.test-XXXXXX";
void* larodInputAddr = MAP_FAILED;
void* larodOutput1Addr = MAP_FAILED;
void* larodOutput2Addr = MAP_FAILED;
void* larodOutput3Addr = MAP_FAILED;
void* larodOutput4Addr = MAP_FAILED;
int larodInputFd = -1;
int larodOutput1Fd = -1;
int larodOutput2Fd = -1;
int larodOutput3Fd = -1;
int larodOutput4Fd = -1;
createAndMapTmpFile(CONV_INP_FILE_PATTERN, rawWidth * rawHeight * CHANNELS, &larodInputAddr, &larodInputFd);
createAndMapTmpFile(CONV_OUT1_FILE_PATTERN, TENSOR1SIZE, &larodOutput1Addr, &larodOutput1Fd);
createAndMapTmpFile(CONV_OUT2_FILE_PATTERN, TENSOR2SIZE, &larodOutput2Addr, &larodOutput2Fd);
createAndMapTmpFile(CONV_OUT3_FILE_PATTERN, TENSOR3SIZE, &larodOutput3Addr, &larodOutput3Fd);
createAndMapTmpFile(CONV_OUT4_FILE_PATTERN, TENSOR4SIZE, &larodOutput4Addr, &larodOutput4Fd);
In terms of the crop part, another temporary file is created.
char CROP_FILE_PATTERN[] = "/tmp/crop.test-XXXXXX";
void* cropAddr = MAP_FAILED;
int cropFd = -1;
createAndMapTmpFile(CROP_FILE_PATTERN, rawWidth * rawHeight * CHANNELS, &cropAddr, &cropFd);
The larodCreateModelInputs
and larodCreateModelOutputs
methods map the preprocessing input and output tensors with the model.
size_t ppInputs = 0;
size_t ppOutputs = 0;
ppInputTensors = larodCreateModelInputs(ppModel, &ppInputs, &error);
ppOutputTensors = larodCreateModelOutputs(ppModel, &ppOutputs, &error);
The larodSetTensorFd
method then maps each tensor to the corresponding file descriptor to allow IO.
larodSetTensorFd(ppInputTensors[0], larodInputFd, &error);
larodSetTensorFd(ppOutputTensors[0], larodOutput1Fd, &error);
larodSetTensorFd(ppOutputTensors[1], larodOutput2Fd, &error);
larodSetTensorFd(ppOutputTensors[2], larodOutput3Fd, &error);
larodSetTensorFd(ppOutputTensors[3], larodOutput4Fd, &error);
Finally, the larodCreateJobRequest
method creates an inference request to use the model.
infReq = larodCreateJobRequest(ppModel, ppInputTensors, ppNumInputs, ppOutputTensors, ppNumOutputs, cropMap, &error);
By using the getLastFrameBlocking
method, a buffer containing the latest image is retrieved from the ImgProvider
created earlier. Then vdo_buffer_get_data
method is used to extract NV12 data from the buffer.
VdoBuffer* buf = getLastFrameBlocking(provider);
uint8_t* nv12Data = (uint8_t*) vdo_buffer_get_data(buf);
Axis cameras outputs frames on the NV12 YUV format. As this is not normally used as input format to deep learning models,
conversion to e.g., RGB might be needed. This is done by creating a pre-processing job request ppReq
using the function larodCreateJobRequest
.
ppReq = larodCreateJobRequest(ppModel, ppInputTensors, ppNumInputs, ppOutputTensors, ppNumOutputs, cropMap, &error);
The image data is then converted from NV12 format to interleaved uint8_t RGB format by running the larodRunJob
function on the above defined pre-processing job request ppReq
.
larodRunJob(conn, ppReq, &error)
By using the larodRunJob
function on inference request infReq
, the predictions from the MobileNet are saved into the specified addresses.
larodRunJob(conn, infReq, &error);
There are four outputs from the Object Detection model, and each object's location are described in the form of [top, left, bottom, right].
float* locations = (float*) larodOutput1Addr;
float* classes = (float*) larodOutput2Addr;
float* scores = (float*) larodOutput3Addr;
float* numberofdetections = (float*) larodOutput4Addr;
If the score is higher than a threshold args.threshold/100.0
, the results are outputted by the syslog
function, and the object is cropped and saved into jpg form by crop_interleaved
, set_jpeg_configuration
, buffer_to_jpeg
, jpeg_to_file
methods.
syslog(LOG_INFO, "Object %d: Classes: %s - Scores: %f - Locations: [%f,%f,%f,%f]",
i, class_name[(int) classes[i]], scores[i], top, left, bottom, right);
unsigned char* crop_buffer = crop_interleaved(cropAddr, rawWidth, rawHeight, CHANNELS,
crop_x, crop_y, crop_w, crop_h);
buffer_to_jpeg(crop_buffer, &jpeg_conf, &jpeg_size, &jpeg_buffer);
jpeg_to_file(file_name, jpeg_buffer, jpeg_size);
An ACAP application contains a manifest file defining the package configuration.
The file is named manifest.json.<CHIP>
and can be found in the app
directory. The Dockerfile will depending on the chip type(see below) copy the
file to the required name format manifest.json
. The noteworthy attribute for
this tutorial is the runOptions
attribute which allows arguments to be given
to the application and here is handled by the argparse
lib. The
argument order, defined by app/argparse.c, is <model_path input_resolution_width input_resolution_height output_size_in_bytes raw_video_resolution_width raw_video_resolution_height threshold>
.
In the Dockerfile a .tflite
model file corresponding to the chosen chip is
downloaded and added to the ACAP application via the -a flag in the
acap-build
command.
The application is built to specification by the Makefile
and manifest.json
in the app directory. Standing in the application directory, run:
Note
Depending on the network your local build machine is connected to, you may need to add proxy settings for Docker. See Proxy in build time.
docker build --build-arg ARCH=<ARCH> --build-arg CHIP=<CHIP> --tag obj_detect:1.0 .
docker cp $(docker create obj_detect:1.0):/opt/app ./build
where the parameters are:
- <CHIP> is the chip type. Supported values are
artpec8
,cpu
andedgetpu
. - <ARCH> is the architecture. Supported values are
armv7hf
(default) andaarch64
.
N.b. The selected architecture and chip must match the targeted device.
The installable .eap
file is found under:
build/object_detection_app_1_0_0_<ARCH>.eap
Browse to the application page of the Axis device:
http://<AXIS_DEVICE_IP>/index.html#apps
- Click on the tab
Apps
in the device GUI - Enable
Allow unsigned apps
toggle - Click
(+ Add app)
button to upload the application file - Browse to the newly built ACAP application, depending on architecture:
object_detection_app_1_0_0_aarch64.eap
object_detection_app_1_0_0_armv7hf.eap
- Click
Install
- Run the application by enabling the
Start
switch
In the Apps view of the camera, press the icon for your ACAP application. A window will pop up which allows you to start the application. Press the Start icon to run the algorithm.
With the algorithm started, we can view the output by either pressing App log
in the same window, or connect with SSH into the device and view the log with
the following command:
tail -f /var/volatile/log/info.log | grep object_detection
Depending on selected chip, different output is received. The label file is used for identifying objects.
In the system log the chip is sometimes only mentioned as a string, they are mapped as follows:
Chips | Larod 1 (int) | Larod 3 (string) |
---|---|---|
CPU with TensorFlow Lite | 2 | cpu-tflite |
Google TPU | 4 | google-edge-tpu-tflite |
Ambarella CVFlow (NN) | 6 | ambarella-cvflow |
ARTPEC-8 DLPU | 12 | axis-a8-dlpu-tflite |
There are four outputs from MobileNet SSD v2 (COCO) model. The number of detections, cLasses, scores, and locations are shown as below. The four location numbers stand for [top, left, bottom, right]. By the way, currently the saved images will be overwritten continuously, so those saved images might not all from the detections of the last frame, if the number of detections is less than previous detection numbers.
[ INFO ] object_detection[645]: Object 1: Classes: 2 car - Scores: 0.769531 - Locations: [0.750146,0.086451,0.894765,0.299347]
[ INFO ] object_detection[645]: Object 2: Classes: 2 car - Scores: 0.335938 - Locations: [0.005453,0.101417,0.045346,0.144171]
[ INFO ] object_detection[645]: Object 3: Classes: 2 car - Scores: 0.308594 - Locations: [0.109673,0.005128,0.162298,0.050947]
The detected objects with a score higher than a threshold are saved into /tmp folder in .jpg form as well.