Skip to content

Commit

Permalink
0.4.0 release (#215)
Browse files Browse the repository at this point in the history
Update tools for UIF 1.2
Update quickstart to wait for the server
Update readme links
Add shape to ImageInferenceRequest
Bump to 0.4.0
Bump up to ROCM 5.6.1
Exclude Py3.6 from wheels

Signed-off-by: Varun Sharma <[email protected]>
  • Loading branch information
varunsh-xilinx authored Sep 7, 2023
1 parent 92666f5 commit 9fde9bb
Show file tree
Hide file tree
Showing 35 changed files with 161 additions and 72 deletions.
41 changes: 38 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,39 @@ Unreleased
Added
^^^^^

* N/A

Changed
^^^^^^^

* N/A

Deprecated
^^^^^^^^^^

* N/A

Removed
^^^^^^^

* N/A

Fixed
^^^^^

* N/A

Security
^^^^^^^^

* N/A

:github:`0.4.0 <Xilinx/inference-server/releases/tag/v0.4.0>` - 2023-09-07
--------------------------------------------------------------------------

Added
^^^^^

* An example MLPerf app using the inference server API (:pr:`129`)
* Google Benchmark for writing performance-tracking tests (:pr:`147`)
* Custom memory storage classes in the memory pool (:pr:`166`)
Expand All @@ -37,7 +70,8 @@ Added
* Tests with FP16 (:pr:`189` and :pr:`203`)
* Versioned models (:pr:`190`)
* Expand benchmarking with MLPerf app (:pr:`197`) and add to data to docs (:pr:`198`)

* Custom environment configuration per test (:pr:`214`)
* VCK5000 test (:pr:`214`)

Changed
^^^^^^^
Expand All @@ -57,7 +91,7 @@ Changed
* Close dynamically opened libraries (:pr:`186`)
* Replace Jaeger exporter with OTLP (:pr:`187`)
* Change STRING type to BYTES and shape type from uint64 to int64 (:pr:`190`)
* Rename ONNX file to MXR correctly (:pr:`202`)
* Include the correct tensor name in ModelMetadata in the XModel backend (:pr:`207`)

Deprecated
^^^^^^^^^^
Expand All @@ -67,7 +101,7 @@ Deprecated
Removed
^^^^^^^

* N/A
* Python 3.6 support (:pr:`215`)

Fixed
^^^^^
Expand All @@ -78,6 +112,7 @@ Fixed
* Fix building with different CMake options (:pr:`170`)
* Fix wheel generation with vcpkg (:pr:`191`)
* Load models at startup correctly (:pr:`195`)
* Fix handling MIGraphX models with dots in the names (:pr:`202`)

Security
^^^^^^^^
Expand Down
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ endif()

list(APPEND VCPKG_MANIFEST_FEATURES "testing")

# In CMake 3.27+, find_package uses <PACKAGE_NAME>_ROOT variables. We're using
# AKS_ROOT in the environment currently.
if(${CMAKE_VERSION} VERSION_GREATER "3.27")
cmake_policy(SET CMP0144 OLD)
endif()

# set the project name
project(
amdinfer
Expand Down
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ The AMD Inference Server is integrated with the following libraries out of the g
* TensorFlow and PyTorch models with `ZenDNN <https://developer.amd.com/zendnn/>`__ on CPUs (optimized for AMD CPUs)
* ONNX models with `MIGraphX <https://github.com/ROCmSoftwarePlatform/AMDMIGraphX>`__ on AMD GPUs
* XModel models with `Vitis AI <https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html>`__ on AMD FPGAs
* A graph of computation including as pre- and post-processing can be written using `AKS <https://github.com/Xilinx/Vitis-AI/tree/v3.0/src/AKS>`__ on AMD FPGAs for end-to-end inference
* A graph of computation including as pre- and post-processing can be written using `AKS <https://github.com/Xilinx/Vitis-AI/tree/bbd45838d4a93f894cfc9f232140dc65af2398d1/src/AKS>`__ on AMD FPGAs for end-to-end inference

Quick Start Deployment and Inference
------------------------------------

The following example demonstrates how to deploy the server locally and run a sample inference.
This example runs on the CPU and does not require any special hardware.
You can see a more detailed version of this example in the `quickstart <https://xilinx.github.io/inference-server/main/quickstart_inference.html>`__.
You can see a more detailed version of this example in the `quickstart <https://xilinx.github.io/inference-server/main/quickstart.html>`__.

.. code-block:: bash
Expand Down Expand Up @@ -80,7 +80,7 @@ Learn more

The documentation for the AMD Inference Server is available `online <https://xilinx.github.io/inference-server/>`__.

Check out the quickstart guides online to help you get started based on your use case(s): `inference <https://xilinx.github.io/inference-server/main/quickstart_inference.html>`__, `deployment <https://xilinx.github.io/inference-server/main/quickstart_deployment.html>`__ and `development <https://xilinx.github.io/inference-server/main/quickstart_development.html>`__.
Check out the `quickstart <https://xilinx.github.io/inference-server/main/quickstart.html>`__ online to help you get started.

Support
-------
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.0-dev
0.4.0
8 changes: 4 additions & 4 deletions docker/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,13 +316,13 @@ def get_xrm_xrt_packages(package_manager):
if package_manager == "apt":
return textwrap.dedent(
"""\
&& wget --quiet -O xrt.deb https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.354_20.04-amd64-xrt.deb \\
&& wget --quiet -O xrt.deb https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.418_20.04-amd64-xrt.deb \\
&& wget --quiet -O xrm.deb https://www.xilinx.com/bin/public/openDownload?filename=xrm_202220.1.5.212_20.04-x86_64.deb \\"""
)
elif package_manager == "yum":
return textwrap.dedent(
"""\
&& wget --quiet -O xrt.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.354_7.8.2003-x86_64-xrt.rpm \\
&& wget --quiet -O xrt.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.418_7.8.2003-x86_64-xrt.rpm \\
&& wget --quiet -O xrm.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrm_202220.1.5.212_7.8.2003-x86_64.rpm \\"""
)
raise ValueError(f"Unknown base image type: {package_manager}")
Expand Down Expand Up @@ -576,8 +576,8 @@ def install_dev_packages(manager: PackageManager, core):


def install_migraphx(manager: PackageManager, custom_backends):
migraphx_apt_repo = 'echo "deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/5.4.1/ ubuntu main" > /etc/apt/sources.list.d/rocm.list'
migraphx_yum_repo = '"[ROCm]\\nname=ROCm\\nbaseurl=https://repo.radeon.com/rocm/yum/5.4.1/\\nenabled=1\\ngpgcheck=1\\ngpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/rocm.repo'
migraphx_apt_repo = 'echo "deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/5.6.1/ ubuntu main" > /etc/apt/sources.list.d/rocm.list'
migraphx_yum_repo = '"[ROCm]\\nname=ROCm\\nbaseurl=https://repo.radeon.com/rocm/yum/5.6.1/\\nenabled=1\\ngpgcheck=1\\ngpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/rocm.repo'

if manager.name == "apt":
add_repo = (
Expand Down
1 change: 1 addition & 0 deletions docs/backends/vitis_ai.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ While not every model is tested on every FPGA, the Vitis AI backend has run at l

Alveo,U250,DPUCADF8H
Versal,VCK5000,DPUCVDX8H
Alveo,V70,DPUCV2DX8G

Other devices and DPUs may also work but are currently untested.

Expand Down
5 changes: 4 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ def hide_private_module(app, what, name, obj, options, signature, return_annotat

# strip leading $ from bash code blocks
copybutton_prompt_text = "$ "
copybutton_here_doc_delimiter = "EOF"
# selecting the literal block doesn't work to show the copy button correctly
# copybutton_selector = ":is(div.highlight pre, pre.literal-block)"

# raise a warning if a cross-reference cannot be found
nitpicky = True
Expand Down Expand Up @@ -256,7 +259,7 @@ def hide_private_module(app, what, name, obj, options, signature, return_annotat

html_context["languages"] = [("en", "/" + "inference-server/" + version + "/")]

versions = ["0.1.0", "0.2.0", "0.3.0"]
versions = ["0.1.0", "0.2.0", "0.3.0", "0.4.0"]
versions.append("main")
html_context["versions"] = []
for version in versions:
Expand Down
6 changes: 3 additions & 3 deletions docs/dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The following packages are installed from Github.
:github:`protocolbuffers/protobuf`,3.19.4,BSD-3,Dynamically linked by amdinfer-server and Vitis libraries\ :superscript:`a 0`
:github:`fpagliughi/sockpp`,e5c51b5,BSD-3,Dynamically linked by amdinfer-server :superscript:`a 0`
:github:`gabime/spdlog`,1.8.2,MIT,Statically linked by amdinfer-server for logging\ :superscript:`a 0`
:github:`Xilinx/Vitis-AI`,3.0,Apache 2.0,VART is dynamically linked by amdinfer-server\ :superscript:`a 1`
:github:`Xilinx/Vitis-AI`,3.5,Apache 2.0,VART is dynamically linked by amdinfer-server\ :superscript:`a 1`
:github:`wg/wrk`,4.1.0,modified Apache 2.0,Executable used for benchmarking amdinfer-server\ :superscript:`d 0`

Others
Expand All @@ -203,8 +203,8 @@ The following packages are installed from Xilinx.
:header: Name,Version,License,Usage
:widths: auto

:xilinxDownload:`XRM <xrm_202120.1.3.29_18.04-x86_64.deb>`,1.3.29,Apache 2.0,Used for FPGA resource management\ :superscript:`a 1`
:xilinxDownload:`XRT <xrt_202120.2.12.427_18.04-amd64-xrt.deb>`,2.12.427,Apache 2.0,Used for communicating to the FPGA\ :superscript:`a 1`
:xilinxDownload:`XRM <xrm_202220.1.5.212_20.04-x86_64.deb>`,1.15.212,Apache 2.0,Used for FPGA resource management\ :superscript:`a 1`
:xilinxDownload:`XRT <xrt_202220.2.14.418_20.04-amd64-xrt.deb>`,2.14.418,Apache 2.0,Used for communicating to the FPGA\ :superscript:`a 1`

AMD
^^^
Expand Down
6 changes: 3 additions & 3 deletions docs/dry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,15 @@ In this case, the endpoint is defined in the model's configuration file in the r
.. code-tab:: console CPU

# this image is not available on Dockerhub yet but you can build it yourself from the repository
$ docker pull amdih/serve:uif1.1_zendnn_amdinfer_0.4.0
$ docker pull amdih/serve:uif1.2_zendnn_amdinfer_0.4.0

.. code-tab:: text GPU

# this image is not available on Dockerhub yet but you can build it yourself from the repository
$ docker pull amdih/serve:uif1.1_migraphx_amdinfer_0.4.0
$ docker pull amdih/serve:uif1.2_migraphx_amdinfer_0.4.0

.. code-tab:: console FPGA

# this image is not available on Dockerhub yet but you can build it yourself from the repository
$ docker pull amdih/serve:uif1.1_vai_amdinfer_0.4.0
$ docker pull amdih/serve:uif1.2_vai_amdinfer_0.4.0
-docker_pull_deployment_images
30 changes: 19 additions & 11 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,17 +81,19 @@ The CPU version has no special hardware requirements to run so you can always ru

.. code-tab:: console FPGA

# this example assumes a U250. If you're using a different board, download the appropriate model for your board instead
$ wget -O vitis.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=resnet_v1_50_tf-u200-u250-r2.5.0.tar.gz
$ tar -xzf vitis.tar.gz "resnet_v1_50_tf/resnet_v1_50_tf.xmodel"
$ mkdir -p ./model_repository/resnet50/1
$ mv ./resnet_v1_50_tf/resnet_v1_50_tf.xmodel ./model_repository/resnet50/1/

For the models used here, their corresponding ``config.toml`` should be placed in the chosen model repository (``./model_repository/resnet50/``):
For the models used here, you can save their corresponding ``config.toml`` to the correct path with:

.. tabs::

.. code-tab:: toml CPU
.. code-tab:: shell CPU

cat <<EOF > "./model_repository/resnet50/config.toml"
name = "resnet50"
platform = "tensorflow_graphdef"

Expand All @@ -104,9 +106,11 @@ For the models used here, their corresponding ``config.toml`` should be placed i
name = "resnet_v1_50/predictions/Reshape_1"
datatype = "FP32"
shape = [1000]
EOF

.. code-tab:: text GPU
.. code-tab:: shell GPU

cat <<EOF > "./model_repository/resnet50/config.toml"
name = "resnet50"
platform = "onnx_onnxv1"

Expand All @@ -119,9 +123,11 @@ For the models used here, their corresponding ``config.toml`` should be placed i
name = "output"
datatype = "FP32"
shape = [1000]
EOF

.. code-tab:: console FPGA
.. code-tab:: shell FPGA

cat <<EOF > "./model_repository/resnet50/config.toml"
name = "resnet50"
platform = "vitis_xmodel"

Expand All @@ -134,6 +140,7 @@ For the models used here, their corresponding ``config.toml`` should be placed i
name = "output"
datatype = "INT8"
shape = [1000]
EOF

The name must match the name of the model directory: it defines the endpoint that will be used for inference.
The platform identifies the type of the model and determines the file extension of the model file.
Expand Down Expand Up @@ -173,15 +180,15 @@ The flags used in this sample command are:

.. code-tab:: console CPU

$ docker run -d --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.1_zendnn_amdinfer_0.4.0
$ docker run -d --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_zendnn_amdinfer_0.4.0

.. code-tab:: console GPU

$ docker run -d --device /dev/kfd --device /dev/dri --volume $(pwd)/model_repository:/mnt/models:rw --publish 127.0.0.1::8998 --publish 127.0.0.1::50051 amdih/serve:uif1.1_migraphx_amdinfer_0.4.0
$ docker run -d --device /dev/kfd --device /dev/dri --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_migraphx_amdinfer_0.4.0

.. code-tab:: console FPGA

$ docker run -d --device /dev/dri --device /dev/xclmgmt<id> --volume $(pwd)/model_repository:/mnt/models:rw --publish 127.0.0.1::8998 --publish 127.0.0.1::50051 amdih/serve:uif1.1_vai_amdinfer_0.4.0
$ docker run -d --device /dev/dri --device /dev/xclmgmt<id> --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_vai_amdinfer_0.4.0

The endpoints for each model will be the name of the model in the ``config.toml``, which should match the name of the parent directory in the model repository.
In this example, it would be "resnet50".
Expand All @@ -195,7 +202,7 @@ Server deployment summary
After setting up the server as above, you have the following information:

* IP address: 127.0.0.1 since the server is running on the same machine where you will run the inference
* Ports: 8998 and 50051 for HTTP and gRPC, respectively. If you used ``--publish``, your port numbers may be different and you can see what they are using ``docker ps``.
* Ports: 8998 and 50051 for HTTP and gRPC, respectively. If you used ``--publish`` in the ``docker run`` command to remap the ports, your port numbers may be different and you can see what they are using ``docker ps``.
* Endpoint: "resnet50" since that is what the model name was used in the model repository and in the configuration file

The rest of this example will use these values in the sample code so substitute your own values if they are different.
Expand Down Expand Up @@ -239,21 +246,22 @@ These results are post-processed and the top 5 labels for the image are printed.
.. parsed-literal::
$ wget :amdinferRawFull:`examples/resnet50/tfzendnn.py`
$ python3 tfzendnn.py --ip 127.0.0.1 --grpc-port 50051 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
$ python3 tfzendnn.py --ip 127.0.0.1 --grpc-port 50051 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
.. group-tab:: GPU

.. parsed-literal::
$ wget :amdinferRawFull:`examples/resnet50/migraphx.py`
$ python3 migraphx.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
# This will take some time initially as MIGraphX will compile the ONNX model to MXR
$ python3 migraphx.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
.. group-tab:: FPGA

.. parsed-literal::
$ wget :amdinferRawFull:`examples/resnet50/vitis.py`
$ python3 vitis.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
$ python3 vitis.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
After running the script, you should get output similar to the following.
The exact output may be slightly different depending on whether you used CPU, GPU or FPGA versions of the example.
Expand Down
4 changes: 3 additions & 1 deletion examples/resnet50/migraphx.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,9 @@ int main(int argc, char* argv[]) {
amdinfer::HttpClient client{server_addr};

std::optional<amdinfer::Server> server;
if (args.ip == "127.0.0.1" && !client.serverLive()) {
if (args.wait) {
// if wait is true, skip ahead to waiting for the server to become ready
} else if (args.ip == "127.0.0.1" && !client.serverLive()) {
std::cout << "No server detected. Starting locally...\n";
server.emplace();
server.value().startHttp(args.http_port);
Expand Down
5 changes: 4 additions & 1 deletion examples/resnet50/migraphx.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,11 @@ def main(args):

server_addr = f"http://{args.ip}:{args.http_port}"
client = amdinfer.HttpClient(server_addr)
if args.wait:
# if wait is true, skip ahead to waiting for the server to become ready
pass
# start it locally if it doesn't already up if the IP address is the localhost
if args.ip == "127.0.0.1" and not client.serverLive():
elif args.ip == "127.0.0.1" and not client.serverLive():
print("No server detected. Starting locally...")
server = amdinfer.Server()
server.startHttp(args.http_port)
Expand Down
6 changes: 6 additions & 0 deletions examples/resnet50/resnet.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ def parse_args():
help="Name of the output node",
)

parser.add_argument(
"--wait",
action="store_true",
help="Don't start the server automatically and wait for it indefinitely",
)

args = parser.parse_args()

if (not args.image) or (not args.labels):
Expand Down
3 changes: 3 additions & 0 deletions examples/resnet50/resnet50.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ struct Args {
int output_classes = kOutputClasses;
std::string input_node;
std::string output_node;
bool wait;
};

inline Args parseArgs(int argc, char** argv) {
Expand Down Expand Up @@ -80,6 +81,8 @@ inline Args parseArgs(int argc, char** argv) {
cxxopts::value(args.top))
("output-classes", "Number of output classes for this model",
cxxopts::value(args.output_classes))
("wait", "Don't start the server automatically and wait for it indefinitely",
cxxopts::value(args.wait))
("help", "Print help");
// clang-format on

Expand Down
4 changes: 3 additions & 1 deletion examples/resnet50/tfzendnn.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,9 @@ int main(int argc, char* argv[]) {

std::optional<amdinfer::Server> server;
// +start protocol
if (args.ip == "127.0.0.1" && !client.serverLive()) {
if (args.wait) {
// if wait is true, skip ahead to waiting for the server to become ready
} else if (args.ip == "127.0.0.1" && !client.serverLive()) {
std::cout << "No server detected. Starting locally...\n";
server.emplace();
server.value().startGrpc(args.grpc_port);
Expand Down
5 changes: 4 additions & 1 deletion examples/resnet50/tfzendnn.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,11 @@ def main(args):

server_addr = f"{args.ip}:{args.grpc_port}"
client = amdinfer.GrpcClient(server_addr)
if args.wait:
# if wait is true, skip ahead to waiting for the server to become ready
pass
# start it locally if it doesn't already up if the IP address is the localhost
if args.ip == "127.0.0.1" and not client.serverLive():
elif args.ip == "127.0.0.1" and not client.serverLive():
print("No server detected. Starting locally...")
server = amdinfer.Server()
server.startGrpc(args.grpc_port)
Expand Down
Loading

0 comments on commit 9fde9bb

Please sign in to comment.