|
| 1 | +<!-- |
| 2 | +# Copyright 2020 - 2021 MONAI Consortium |
| 3 | +# Licensed under the Apache License, Version 2.0 (the "License"); |
| 4 | +# you may not use this file except in compliance with the License. |
| 5 | +# You may obtain a copy of the License at |
| 6 | +# http://www.apache.org/licenses/LICENSE-2.0 |
| 7 | +# Unless required by applicable law or agreed to in writing, software |
| 8 | +# distributed under the License is distributed on an "AS IS" BASIS, |
| 9 | +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 10 | +# See the License for the specific language governing permissions and |
| 11 | +# limitations under the License. |
| 12 | +
|
| 13 | +
|
| 14 | +# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. |
| 15 | +# |
| 16 | +# Redistribution and use in source and binary forms, with or without |
| 17 | +# modification, are permitted provided that the following conditions |
| 18 | +# are met: |
| 19 | +# * Redistributions of source code must retain the above copyright |
| 20 | +# notice, this list of conditions and the following disclaimer. |
| 21 | +# * Redistributions in binary form must reproduce the above copyright |
| 22 | +# notice, this list of conditions and the following disclaimer in the |
| 23 | +# documentation and/or other materials provided with the distribution. |
| 24 | +# * Neither the name of NVIDIA CORPORATION nor the names of its |
| 25 | +# contributors may be used to endorse or promote products derived |
| 26 | +# from this software without specific prior written permission. |
| 27 | +# |
| 28 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY |
| 29 | +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 30 | +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
| 31 | +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| 32 | +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
| 33 | +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| 34 | +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| 35 | +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY |
| 36 | +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| 37 | +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
| 38 | +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 39 | +--> |
| 40 | + |
| 41 | +[](https://opensource.org/licenses/Apache-2.0) |
| 42 | + |
| 43 | +Deploying MONAI Code via Triton Python Backend |
| 44 | +================================================ |
| 45 | + |
| 46 | +Simple demo to introduce a standard way for model developers to incorporate Python based projects in a standard way. |
| 47 | +In addition, this code will demonstrate how Users/Developers can easily deploy MONAI inference code for field testing. |
| 48 | +Finally, the code will demonstrate a method to do low latency classification and validation inference with Triton. |
| 49 | + |
| 50 | +The steps below describe how to set-up a model repository, pull the Triton container, launch the Triton inference server, and then send inference requests to the running server. |
| 51 | + |
| 52 | +This demo and description borrows heavily from the [Triton Python Backend](https://github.com/triton-inference-server/python_backend) repo. The demo assumes you have at least one GPU |
| 53 | + |
| 54 | +Get The Demo Source Code |
| 55 | +------------------------------- |
| 56 | +Pull down the demo repository and start with the [Quick Start] (#quick-start) guide. |
| 57 | + |
| 58 | +``` |
| 59 | +$ git clone https://github.com/Project-MONAI/tutorials.git |
| 60 | +``` |
| 61 | +# Python Backend |
| 62 | + |
| 63 | +The Triton backend for Python. The goal of Python backend is to let you serve |
| 64 | +models written in Python by Triton Inference Server without having to write |
| 65 | +any C++ code. We will use this to demonstrate implementing MONAI code inside Triton. |
| 66 | + |
| 67 | +## User Documentation |
| 68 | + |
| 69 | +* [Quick Start](#quick-start) |
| 70 | +* [Examples](#examples) |
| 71 | +* [Usage](#usage) |
| 72 | +* [Model Config File](#model-config-file) |
| 73 | +* [Error Hanldling](#error-handling) |
| 74 | + |
| 75 | + |
| 76 | +## Quick Start |
| 77 | + |
| 78 | +1. Build Triton Container Image and Copy Model repository files using shell script |
| 79 | + |
| 80 | +``` |
| 81 | +$ ./triton_build.sh |
| 82 | +``` |
| 83 | +2. Run Triton Container Image in Background Terminal using provided shell script |
| 84 | +The supplied script will start the demo container with Triton and expose the three ports to localhost needed for the application to send inference requests. |
| 85 | +``` |
| 86 | +$ ./triton_run_local.sh |
| 87 | +``` |
| 88 | +3. Install environment for client |
| 89 | +The client environment should have Python 3 installed and should have the necessary packages installed. |
| 90 | +``` |
| 91 | +$ python3 -m pip install -r requirements.txt |
| 92 | +``` |
| 93 | +4. Other dependent libraries for the Python Triton client are available as a Python packages |
| 94 | +``` |
| 95 | +$ pip install nvidia-pyindex |
| 96 | +$ pip install tritonclient[all] |
| 97 | +``` |
| 98 | +5. Run the client program |
| 99 | +The [client](./client/client.py) program will take an optional file input and perform classification to determine whether the study shows COVID-19 or not COVID-19. See the [NVIDIA COVID-19 Classification ](https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_covid19_3d_ct_classification) example in NGC for more background. |
| 100 | +``` |
| 101 | +$ python -u client/client.py [filename/directory] |
| 102 | +``` |
| 103 | +and the program returns |
| 104 | +``` Default input for the client is client/test_data/prostate_24.nii.gz |
| 105 | +$ Classification result: ['Prostate segmented'] |
| 106 | +``` |
| 107 | +## Examples: |
| 108 | +The example demonstrates running a Triton Python Backend on a single image classification problem. |
| 109 | +1. First, a Dockerfile and build script is used to build a container to Run the Triton Service and copy the model specific files in the container. |
| 110 | +```Dockerfile: |
| 111 | +# use desired Triton container as base image for our app |
| 112 | +FROM nvcr.io/nvidia/tritonserver:21.04-py3 |
| 113 | +
|
| 114 | +# create model directory in container |
| 115 | +RUN mkdir -p /models/monai_covid/1 |
| 116 | +
|
| 117 | +# install project-specific dependencies |
| 118 | +COPY requirements.txt . |
| 119 | +RUN pip install -r requirements.txt |
| 120 | +RUN rm requirements.txt |
| 121 | +
|
| 122 | +# copy contents of model project into model repo in container image |
| 123 | +COPY models/monai_covid/config.pbtxt /models/monai_covid |
| 124 | +COPY models/monai_covid/1/model.py /models/monai_covid/1 |
| 125 | +
|
| 126 | +
|
| 127 | +ENTRYPOINT [ "tritonserver", "--model-repository=/models"] |
| 128 | +``` |
| 129 | +Note: The Triton service expects a certain directory structure discussed in [Model Config File](#model-config-file) to load the model definitions. |
| 130 | + |
| 131 | +2. Next, the container with the Triton Service runs as a service (in background or separate terminal for demo). In this example, the ports used by the Triton Service are set to `8000` for client communications. |
| 132 | +```Dockerfile: |
| 133 | +demo_app_image_name="monai_triton:demo" |
| 134 | +docker run --shm-size=128G --rm -p 127.0.0.1:8000:8000 -p 127.0.0.1:8001:8001 -p 127.0.0.1:8090:8002 ${demo_app_image_name} |
| 135 | +``` |
| 136 | +3. See [Model Config File](#model-config-file) to see the expected file structure for Triton. |
| 137 | +- Modify the models/monai_prostrate/1/model.py file to satisfy any model configuration requirements while keeping the required components in the model definition. See the * [Usage](#usage) section for background. |
| 138 | +- In the models/monai_prostrate/1/config.pbtxt file configure the number of GPUs and which ones are used. |
| 139 | +e.g. Using two available GPUs and two parallel versions of the model per GPU |
| 140 | +``` |
| 141 | +instance_group [ |
| 142 | + { |
| 143 | + kind: KIND_GPU |
| 144 | + count: 2 |
| 145 | + gpus: [ 0, 1 ] |
| 146 | + } |
| 147 | +``` |
| 148 | +e.g. Using three of four available GPUs and four parallel versions of the model per GPU |
| 149 | +``` |
| 150 | +instance_group [ |
| 151 | + { |
| 152 | + kind: KIND_GPU |
| 153 | + count: 4 |
| 154 | + gpus: [ 0, 1, 3 ] |
| 155 | + } |
| 156 | +``` |
| 157 | +Also, other configurations like dynamic batching and corresponding sizes can be configured. See the [Triton Service Documentation model configurations](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md) documentation for more information. |
| 158 | + |
| 159 | +- Finally, be sure to include Tensors or Torchscript definition *.ts files in the directory structure. In this example, a COVID19 classificatiion model based in PyTorch is used. |
| 160 | +``` |
| 161 | +covid19_model.ts |
| 162 | +``` |
| 163 | +The Dockerfile will copy the model definition structure into the Triton container Service. When the container is run, the python backend implementation will pull the covid19_model.ts file from a Google Drive for the demo. So the container should be rebuilt after any modifications to the GPU configuration or model configurations for the example. |
| 164 | + |
| 165 | +4. A Python [client](./client/client.py) program configures the model and makes an http request to Triton as a Service. Note: Triton supports other interfaces like gRPC. |
| 166 | +The client reads an input image converted from Nifti to a byte array for classification. |
| 167 | + |
| 168 | +- In this example, a model trained to detect COVID-19 is provided an image with COVID or without. |
| 169 | +```python: |
| 170 | +filename = 'client/test_data/volume-covid19-A-0000.nii.gz' |
| 171 | +``` |
| 172 | +- The client calls the Triton Service using the external port configured previously. |
| 173 | +```python: |
| 174 | +with httpclient.InferenceServerClient("localhost:8000") as client: |
| 175 | +``` |
| 176 | +- The Triton inference response is returned : |
| 177 | +```python: |
| 178 | +response = client.infer(model_name, |
| 179 | + inputs, |
| 180 | + request_id=str(uuid4().hex), |
| 181 | + outputs=outputs,) |
| 182 | +
|
| 183 | +result = response.get_response() |
| 184 | +``` |
| 185 | +------- |
| 186 | +## Usage |
| 187 | +[See Triton Inference Server/python_backend documentation](https://github.com/triton-inference-server/python_backend#usage) |
| 188 | +## Model Config |
| 189 | +[See Triton Inference Server/python_backend documentation](https://github.com/triton-inference-server/python_backend#model-config-file) |
| 190 | +## Error Handling |
| 191 | +[See Triton Inference Server/python_backend documentation](https://github.com/triton-inference-server/python_backend#error-handling) |
0 commit comments