Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PA Migration: Doc Updates #105

Merged
merged 2 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ tritonserver --model-repository=/models

### Measuring Performance

Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
Having made some improvements to the model's serving capabilities by enabling `dynamic batching` and the use of `multiple model instances`, the next step is to measure the impact of these features. To that end, the Triton Inference Server comes packaged with the [Performance Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md) which is a tool specifically designed to measure performance for Triton Inference Servers. For ease of use, it is recommended that users run this inside the same container used to run client code in Part 1 of this series.
```
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:yy.mm-py3-sdk bash
```
Expand Down
206 changes: 103 additions & 103 deletions Quick_Deploy/OpenVINO/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,136 +28,136 @@

# Deploying ONNX, PyTorch and TensorFlow Models with the OpenVINO Backend

This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).
This README demonstrates how to deploy simple ONNX, PyTorch and TensorFlow models on Triton Inference Server using the [OpenVINO backend](https://github.com/triton-inference-server/openvino_backend).


## Deploying an ONNX Model
### 1. Build the model repository and download the ONNX model.
```
mkdir -p model_repository/densenet_onnx/1
mkdir -p model_repository/densenet_onnx/1
wget -O model_repository/densenet_onnx/1/model.onnx \
https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx
```

### 2. Create a new file named `config.pbtxt`
```
name: "densenet_onnx"
backend: "openvino"
default_model_filename: "model.onnx"
name: "densenet_onnx"
backend: "openvino"
default_model_filename: "model.onnx"
```

### 3. Place the `config.pbtxt` file in the model repository, the structure should look as follows:
```
model_repository
|
+-- densenet_onnx
|
+-- config.pbtxt
+-- 1
|
model_repository
|
+-- densenet_onnx
|
+-- config.pbtxt
+-- 1
|
+-- model.onnx
```

Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.

### 4. Run the Triton Inference Server
```
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
```

### 5. Download the Triton Client code `client.py` from GitHub to a place you want to run the Triton Client from.
```
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/ONNX/client.py
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/ONNX/client.py
```

### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
```
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
docker run -it --rm --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
```
```
pip install torchvision
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
pip install torchvision
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
```

### 7. Output
```
['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']
['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']
```
The output format here is `<confidence_score>:<classification_index>`. To learn how to map these to the label names and more, refer to our [documentation](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md). The client code above is available in `client.py`.

## Deploying a PyTorch Model
### 1. Download and prepare the PyTorch model.
PyTorch models (.pt) will need to be converted to OpenVINO format. Create a `downloadAndConvert.py` file to download the PyTorch model and use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
```
import torchvision
import torch
import openvino as ov
model = torchvision.models.resnet50(weights='DEFAULT')
ov_model = ov.convert_model(model)
import torchvision
import torch
import openvino as ov
model = torchvision.models.resnet50(weights='DEFAULT')
ov_model = ov.convert_model(model)
ov.save_model(ov_model, 'model.xml')
```

Install the dependencies:
```
pip install openvino
pip install openvino
pip install torchvision
```

Run `downloadAndConvert.py`
```
python3 downloadAndConvert.py
python3 downloadAndConvert.py
```

To convert your own PyTorch model, refer to [Converting a PyTorch Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-pytorch.html)

### 2. Create a new file named `config.pbtxt`
```
name: "resnet50 "
backend: "openvino"
max_batch_size : 0
input [
{
name: "x"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
reshape { shape: [ 1, 3, 224, 224 ] }
}
]
output [
{
name: "x.45"
data_type: TYPE_FP32
dims: [ 1, 1000 ,1, 1]
reshape { shape: [ 1, 1000 ] }
}
name: "resnet50 "
backend: "openvino"
max_batch_size : 0
input [
{
name: "x"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
reshape { shape: [ 1, 3, 224, 224 ] }
}
]
output [
{
name: "x.45"
data_type: TYPE_FP32
dims: [ 1, 1000 ,1, 1]
reshape { shape: [ 1, 1000 ] }
}
]
```

3. Place the config.pbtxt file in the model repository as well as the model.xml and model.bin, the folder structure should look as follows:
```
model_repository
|
+-- resnet50
|
+-- config.pbtxt
+-- 1
|
+-- model.xml
+-- model.bin
model_repository
|
+-- resnet50
|
+-- config.pbtxt
+-- 1
|
+-- model.xml
+-- model.bin
```

Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.

### 4. Run the Triton Inference Server
```
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
```

### 5. In another terminal, download the Triton Client code `client.py` from GitHub to the place you want to run the Triton Client from.
```
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/PyTorch/client.py
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/PyTorch/client.py
```

In the `client.py` file, you’ll need to update the model input and output names to match those expected by the backend as the model is slightly different from the one in the Triton tutorial. For example, change the original input name used in the PyTorch model (input__0) to the name used by the OpenVINO backend (x).
Expand All @@ -170,12 +170,12 @@ In the `client.py` file, you’ll need to update the model input and output name
### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
```
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
```
```
pip install torchvision
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
pip install torchvision
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
```

### 7. Output
Expand All @@ -192,99 +192,99 @@ Export the TensorFlow model in SavedModel format:
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/tensorflow:24.04-tf2-py3
```
```
python3 export.py
python3 export.py
```

The model will need to be converted to OpenVINO format. Create a `convert.py` file to use the OpenVINO Model Converter to save a `model.xml` and `model.bin`:
```
import openvino as ov
ov_model = ov.convert_model(' path_to_saved_model_dir’)
import openvino as ov
ov_model = ov.convert_model(' path_to_saved_model_dir’)
ov.save_model(ov_model, 'model.xml')
```

Install the dependencies:
```
pip install openvino
pip install openvino
```

Run `convert.py`
```
python3 convert.py
python3 convert.py
```

To convert your TensorFlow model, refer to [Converting a TensorFlow Model](https://docs.openvino.ai/2024/openvino-workflow/model-preparation/convert-model-tensorflow.html)

### 2. Create a new file named `config.pbtxt`
```pbtxt
name: "resnet50"
backend: "openvino"
max_batch_size : 0
input [
{
name: "input_1"
data_type: TYPE_FP32
dims: [-1, 224, 224, 3 ]
}
]
output [
{
name: "predictions"
data_type: TYPE_FP32
dims: [-1, 1000]
}
name: "resnet50"
backend: "openvino"
max_batch_size : 0
input [
{
name: "input_1"
data_type: TYPE_FP32
dims: [-1, 224, 224, 3 ]
}
]
output [
{
name: "predictions"
data_type: TYPE_FP32
dims: [-1, 1000]
}
]
```

### 3. Place the `config.pbtxt` file in the model repository as well as the `model.xml` and `model.bin`, the structure should look as follows:
```
model_repository
|
+-- resnet50
|
+-- config.pbtxt
+-- 1
|
+-- model.xml
+-- model.bin
model_repository
|
+-- resnet50
|
+-- config.pbtxt
+-- 1
|
+-- model.xml
+-- model.bin
```

Note: This directory structure is how the Triton Inference Server can read the configuration and model files and must follow the required layout. Do not place any other folders or files in the model repository other than the needed model files.

### 4. Run the Triton Inference Server
```
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.04-py3 tritonserver --model-repository=/models
```

### 5. In another terminal, download the Triton Client code `client.py` from GitHub to the place you want to run the Triton Client from.
```
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/TensorFlow/client.py
wget https://raw.githubusercontent.com/triton-inference-server/tutorials/main/Quick_Deploy/TensorFlow/client.py
```

### 6. Run the Triton Client in the same location as the `client.py` file, install dependencies, and query the server.
Building a client requires three basic points. First, we setup a connection with the Triton Inference Server. Second, we specify the names of the input and output layer(s) of our model. And last, we send an inference request to the Triton Inference Server.
```
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
docker run -it --net=host -v ${PWD}:/workspace/ nvcr.io/nvidia/tritonserver:24.04-py3-sdk bash
```
```
pip install --upgrade tensorflow
pip install image
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
pip install --upgrade tensorflow
pip install image
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
python3 client.py
```

### 7. Output
```
[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'
[b'0.301167:90' b'0.169790:14' b'0.161309:92' b'0.093105:94'

b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'
b'0.058743:136' b'0.050185:11' b'0.033802:91' b'0.011760:88'

b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'
b'0.008309:989' b'0.004927:95' b'0.004905:13' b'0.004095:317'

b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'
b'0.004006:96' b'0.003694:12' b'0.003526:42' b'0.003390:313'

...
...

b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'
b'0.000001:751' b'0.000001:685' b'0.000001:408' b'0.000001:116'

b'0.000001:627' b'0.000001:933' b'0.000000:661' b'0.000000:148']
```
Expand Down
Loading