Note: I'm in the process of migrating it to Jetpack 3.1 so it's not going to work at this time.
This is a fork of NVIDIA's deep learning inference library. If you haven't seen used that yet then I strongly advise you to use that use that as a starting point, and that can be obtained on GitHub. Most everything here was copied from there, and mutilated by someone who hacks together some code once every 5 years or so. So best practices are not exactly followed.
The main purpose of this fork is to test out pipelining DetectNet, and ImageNet. Where DetectNet is used to detect the presence of a type of object (car, boat, plane), and an ImageNet model is used to further classify the detected object (what make/model car, what type of plane, etc).
This repository is kept as close to jetson-inference as possible with only adding a few routines that were needed. The ImageNet and DetectNet examples should work as they did.
I added two example demos. One of these demos is dualnet-camera that combines image detection and recognition. The other demo is a very simplified live-camera based blackjack game.
Provided along with this repo are TensorRT-enabled examples of running Googlenet/Alexnet on live camera feed for image recognition, and pedestrian detection networks with localization capabilities (i.e. that provide bounding boxes).
The latest source can be obtained from GitHub and compiled onboard Jetson TX1/TX2.
note: this branch is verified against JetPack 2.3 / L4T R24.2 aarch64 (Ubuntu 16.04 LTS)
To obtain the repository, navigate to a folder of your choosing on the Jetson. First, make sure git and cmake are installed locally:
sudo apt-get install git cmake
Then clone the jetson-inference repo:
git clone http://github.com/S4WRXTTCS/jetson-inference
When cmake is run, a special pre-installation script (CMakePreBuild.sh) is run and will automatically install any dependencies.
cd jetson-inference
mkdir build
cd build
cmake ../
Make sure you are still in the jetson-inference/build directory, created above in step #2.
cd jetson-inference/build # omit if pwd is already /build from above
make
Depending on architecture, the package will be built to either armhf or aarch64, with the following directory structure:
|-build
\aarch64 (64-bit)
\bin where the sample binaries are built to
\include where the headers reside
\lib where the libraries are build to
\armhf (32-bit)
\bin where the sample binaries are built to
\include where the headers reside
\lib where the libraries are build to
binaries residing in aarch64/bin, headers in aarch64/include, and libraries in aarch64/lib.
There are multiple types of deep learning networks available, including recognition, detection/localization, and soon segmentation. The first deep learning capability to highlight is image recognition using an 'imageNet' that's been trained to identify similar objects.
The imageNet
object accept an input image and outputs the probability for each class. Having been trained on ImageNet database of 1000 objects, the standard AlexNet and GoogleNet networks are downloaded during step 2 from above.
After building, first make sure your terminal is located in the aarch64/bin directory:
$ cd jetson-inference/build/aarch64/bin
Then, classify an example image with the imagenet-console
program. imagenet-console
accepts 2 command-line arguments: the path to the input image and path to the output image (with the class overlay printed).
$ ./imagenet-console orange_0.jpg output_0.jpg
$ ./imagenet-console granny_smith_1.jpg output_1.jpg
Next, we will use imageNet to classify a live video feed from the Jetson onboard camera.
Similar to the last example, the realtime image recognition demo is located in /aarch64/bin and is called imagenet-camera
.
It runs on live camera stream and depending on user arguments, loads googlenet or alexnet with TensorRT.
$ ./imagenet-camera googlenet # to run using googlenet
$ ./imagenet-camera alexnet # to run using alexnet
The frames per second (FPS), classified object name from the video, and confidence of the classified object are printed to the openGL window title bar. By default the application can recognize up to 1000 different types of objects, since Googlenet and Alexnet are trained on the ILSVRC12 ImageNet database which contains 1000 classes of objects. The mapping of names for the 1000 types of objects, you can find included in the repo under data/networks/ilsvrc12_synset_words.txt
note: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the
DEFAULT_CAMERA
define at the top ofimagenet-camera.cpp
to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.
The previous image recognition examples output class probabilities representing the entire input image. The second deep learning capability to highlight is detecting multiple objects, and finding where in the video those objects are located (i.e. extracting their bounding boxes). This is performed using a 'detectNet' - or object detection / localization network.
The detectNet
object accepts as input the 2D image, and outputs a list of coordinates of the detected bounding boxes. Three example detection network models are are automatically downloaded during the repo source configuration:
- ped-100 (single-class pedestrian detector)
- multiped-500 (multi-class pedestrian + baggage detector)
- facenet-120 (single-class facial recognition detector)
To process test images with detectNet
and TensorRT, use the detectnet-console
program. detectnet-console
accepts command-line arguments representing the path to the input image and path to the output image (with the bounding box overlays rendered). Some test images are included with the repo:
$ ./detectnet-console peds-007.png output-7.png
To change the network that detectnet-console
uses, modify detectnet-console.cpp
(beginning line 33):
detectNet* net = detectNet::Create( detectNet::PEDNET_MULTI ); // uncomment to enable one of these
//detectNet* net = detectNet::Create( detectNet::PEDNET );
//detectNet* net = detectNet::Create( detectNet::FACENET );
Then to recompile, navigate to the jetson-inference/build
directory and run make
.
When using the multiped-500 model (PEDNET_MULTI
), for images containing luggage or baggage in addition to pedestrians, the 2nd object class is rendered with a green overlay.
$ ./detectnet-console peds-008.png output-8.png
Similar to the previous example, detectnet-camera
runs the object detection networks on live video feed from the Jetson onboard camera. Launch it from command line along with the type of desired network:
$ ./detectnet-camera multiped # run using multi-class pedestrian/luggage detector
$ ./detectnet-camera ped-100 # run using original single-class pedestrian detector
$ ./detectnet-camera facenet # run using facial recognition network
$ ./detectnet-camera cardnet # run using Playing Card detection network
$ ./detectnet-camera # by default, program will run using multiped
note: to achieve maximum performance while running detectnet, increase the Jetson TX1 clock limits by running the script:
sudo ~/jetson_clocks.sh
note: by default, the Jetson's onboard CSI camera will be used as the video source. If you wish to use a USB webcam instead, change the
DEFAULT_CAMERA
define at the top ofdetectnet-camera.cpp
to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920.
The dualnet-camera
combines the detection (DetectNet) and recognition (ImageNet) on the live video feed from the Jetson onboard camera. Launch it from the command line along with the desired networks. Where the first network is the DetectNet network, and the second network is the ImageNet Network.
$ ./detectnet-camera cardnet alexnet_54cards # run using PlayingCard Detection, and PlayingCard recognition
$ ./detectnet-camera # by default it runs using PlayingCard Detection, and PlayingCard recognition
$ ./blackjack-camera # by default, program will run using the correct networks
By default, it uses USB camera at device 1. To change this you'll need to change the DEFAULT_CAMERA
define at the top of blackjack-camera.cpp
to reflect the /dev/video V4L2 device of your USB camera. The model it's tested with is Logitech C920. The internal camera can be used, but isn't advised.
To play the game have the camera facing down towards the table. Half of the image is the computer playing area, and half of it is the human side. Simply deal a card to the computer side, and then the human side. The computer will tell you when it wants to hit or stand. To tell the computer that you want to stand then simply use the Red Joker to tell it you're staying. As of now the game is pretty limited in that it doesn't know the ACE can be different values. It's only intended as a demonstration of what's possible with combining ImageNet and DetectNet.
If you find that it's not recognizing cards correctly then move the camera up or down. It also struggles with cards that are too close together. The detectnet detects them as a single card and it screws everything up. You also can't have overlaying cards.
Here is what it should look like. This image shows an 11x17 piece of paper I used to act as the playing table with outlines for the cards, but this isn't needed.
If you need to retrain the DetectNet based CardNet or the ImageNet AlexNet_54cards then you can add the necessary data to the following datasets, and then retrain them in Digits 5.0.
The DetectNet training data is here https://drive.google.com/file/d/0B8dR1eAmu3fTR3l4WkNtR0dqS0E/view?usp=sharing
The ImageNet training data is here https://drive.google.com/file/d/0B8dR1eAmu3fTcG1mZVN4OHFNTU0/view?usp=sharing