initial commit

fujingling · Mar 2, 2018 · 0aced03 · 0aced03
commit 0aced03
Show file tree

Hide file tree

Showing 29 changed files with 4,024 additions and 0 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -0,0 +1,13 @@
+cmake_minimum_required(VERSION 3.1)
+project(trt_image_classification)
+
+find_package(OpenCV REQUIRED)
+find_package(CUDA REQUIRED)
+
+include_directories(${CMAKE_SOURCE_DIR})
+
+set(CUDA_NVCC_FLAGS --std=c++11)
+set(CMAKE_CXX_STANDARD 11)
+
+add_subdirectory(examples)
+add_subdirectory(src)
diff --git a/INSTALL.md b/INSTALL.md
@@ -0,0 +1,34 @@
+Installation
+===
+
+1. Flash the Jetson TX2 using JetPack 3.2.  Be sure to install
+  * CUDA 9.0
+  * OpenCV4Tegra
+  * cuDNN
+  * TensorRT 3.0
+
+2. Install TensorFlow on Jetson TX2.
+  1. ...
+  2. ...
+
+3. Install uff converter on Jetson TX2.
+  1. Download TensorRT 3.0.4 for Ubuntu 16.04 and CUDA 9.0 tar package from https://developer.nvidia.com/nvidia-tensorrt-download.
+  2. Extract archive 
+
+            tar -xzf TensorRT-3.0.4.Ubuntu-16.04.3.x86_64.cuda-9.0.cudnn7.0.tar.gz
+
+  3. Install uff python package using pip 
+
+            sudo pip install TensorRT-3.0.4/uff/uff-0.2.0-py2.py3-none-any.whl
+
+4. Clone and build this project
+
+    ```
+    git clone --recursive https://gitlab-master.nvidia.com/jwelsh/trt_image_classification.git
+    cd trt_image_classification
+    mkdir build
+    cd build
+    cmake ..
+    make 
+    cd ..
+    ```
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0 +1,25 @@
+Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+ * Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+ * Neither the name of NVIDIA CORPORATION nor the names of its
+   contributors may be used to endorse or promote products derived
+   from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
+EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -0,0 +1,78 @@
+TensorFlow->TensorRT Image Classification
+===
+
+This contains examples, scripts and code related to image classification using TensorFlow models
+(from [here](https://github.com/tensorflow/models/tree/master/research/slim#Pretrained))
+converted to TensorRT.  Converting TensorFlow models to TensorRT offers significant performance
+gains on the Jetson TX2 as seen [below](#default_models).
+
+<a name="quick_start"></a>
+## Quick Start
+
+1. Follow the [installation guide](INSTALL.md).
+2. Download the pretrained TensorFlow models and example images.
+
+    ```
+    source scripts/download_models.sh
+    source scripts/download_images.sh
+    ```
+
+3. Convert the pretrained models to frozen graphs.
+
+    ```
+    python scripts/models_to_frozen_graphs.py
+    ```
+
+4. Convert the frozen graphs to optimized TensorRT engines.
+
+    ```
+    python scripts/frozen_graphs_to_plans.py
+    ```
+
+5. Execute the Inception V1 model on a single image.
+
+    ```
+    ./build/examples/classify_image/classify_image data/images/gordon_setter.jpg data/plans/inception_v1.plan data/imagenet_labels_1001.txt input InceptionV1/Logits/SpatialSqueeze inception
+    ```
+
+For more details, read through the examples [link](examples/README.md).
+
+<a name="default_models"></a>
+## Default Models
+
+The table below shows various details related to the default models ported from the TensorFlow 
+slim model zoo.  
+
+| <sub>Model</sub> | <sub>Input Size</sub> | <sub>TensorRT (TX2 / Half)</sub> | <sub>TensorRT (TX2 / Float)</sub> | <sub>TensorFlow (TX2 / Float)</sub> | <sub>Input Name</sub> | <sub>Output Name</sub> | <sub>Preprocessing Fn.</sub> |
+|--- |:---:|:---:|:---:|:---:|---|---|---|
+| <sub>inception_v1</sub> | <sub>224x224</sub> | <sub>7.98ms</sub> | <sub>12.8ms</sub> | <sub>27.6ms</sub> | <sub>input</sub> | <sub>InceptionV1/Logits/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>inception_v3</sub> | <sub>299x299</sub> | <sub>26.3ms</sub> | <sub>46.1ms</sub> | <sub>98.4ms</sub> | <sub>input</sub> | <sub>InceptionV3/Logits/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>inception_v4</sub> | <sub>299x299</sub> | <sub>52.1ms</sub> | <sub>88.2ms</sub> | <sub>176ms</sub> | <sub>input</sub> | <sub>InceptionV4/Logits/Logits/BiasAdd</sub> | <sub>inception</sub> |
+| <sub>inception_resnet_v2</sub> | <sub>299x299</sub> | <sub>53.0ms</sub> | <sub>98.7ms</sub> | <sub>168ms</sub> | <sub>input</sub> | <sub>InceptionResnetV2/Logits/Logits/BiasAdd</sub> | <sub>inception</sub> |
+| <sub>resnet_v1_50</sub> | <sub>224x224</sub> | <sub>15.7ms</sub> | <sub>27.1ms</sub> | <sub>63.9ms</sub> | <sub>input</sub> | <sub>resnet_v1_50/SpatialSqueeze</sub> | <sub>vgg</sub> |
+| <sub>resnet_v1_101</sub> | <sub>224x224</sub> | <sub>29.9ms</sub> | <sub>51.8ms</sub> | <sub>107ms</sub> | <sub>input</sub> | <sub>resnet_v1_101/SpatialSqueeze</sub> | <sub>vgg</sub> |
+| <sub>resnet_v1_152</sub> | <sub>224x224</sub> | <sub>42.6ms</sub> | <sub>78.2ms</sub> | <sub>157ms</sub> | <sub>input</sub> | <sub>resnet_v1_152/SpatialSqueeze</sub> | <sub>vgg</sub> |
+| <sub>resnet_v2_50</sub> | <sub>299x299</sub> | <sub>27.5ms</sub> | <sub>44.4ms</sub> | <sub>92.2ms</sub> | <sub>input</sub> | <sub>resnet_v2_50/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>resnet_v2_101</sub> | <sub>299x299</sub> | <sub>49.2ms</sub> | <sub>83.1ms</sub> | <sub>160ms</sub> | <sub>input</sub> | <sub>resnet_v2_101/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>resnet_v2_152</sub> | <sub>299x299</sub> | <sub>74.6ms</sub> | <sub>124ms</sub> | <sub>230ms</sub> | <sub>input</sub> | <sub>resnet_v2_152/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>mobilenet_v1_0p25_128</sub> | <sub>128x128</sub> | <sub>2.67ms</sub> | <sub>2.65ms</sub> | <sub>15.7ms</sub> | <sub>input</sub> | <sub>MobilenetV1/Logits/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>mobilenet_v1_0p5_160</sub> | <sub>160x160</sub> | <sub>3.95ms</sub> | <sub>4.00ms</sub> | <sub>16.9ms</sub> | <sub>input</sub> | <sub>MobilenetV1/Logits/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>mobilenet_v1_1p0_224</sub> | <sub>224x224</sub> | <sub>12.9ms</sub> | <sub>12.9ms</sub> | <sub>24.4ms</sub> | <sub>input</sub> | <sub>MobilenetV1/Logits/SpatialSqueeze</sub> | <sub>inception</sub> |
+| <sub>vgg_16</sub> | <sub>224x224</sub> | <sub>38.2ms</sub> | <sub>79.2ms</sub> | <sub>171ms</sub> | <sub>input</sub> | <sub>vgg_16/fc8/BiasAdd</sub> | <sub>vgg</sub> |
+
+<!--| inception_v2 | 224x224 | 10.3ms | 16.9ms | 38.3ms | input | InceptionV2/Logits/SpatialSqueeze | inception |-->
+<!--| vgg_19 | 224x224 | 97.3ms | OOM | input | vgg_19/fc8/BiasAdd | vgg |-->
+
+
+The times recorded include data transfer to GPU, network execution, and
+data transfer back from GPU.  Time does not include preprocessing. 
+See **scripts/test_tf.py**, **scripts/test_trt.py**, and **src/test/test_trt.cu** 
+for implementation details.  To reproduce the timings run
+
+```
+python scripts/test_tf.py
+python scripts/test_trt.py
+```
+
+The timing results will be located in **data/test_output_tf.txt** and **data/test_output_trt.txt**.  Note
+that you must download and convert the models (as in the quick start) prior to running the benchmark scripts.