0.4.0 release (#215)

Update tools for UIF 1.2 Update quickstart to wait for the server Update readme links Add shape to ImageInferenceRequest Bump to 0.4.0 Bump up to ROCM 5.6.1 Exclude Py3.6 from wheels Signed-off-by: Varun Sharma <[email protected]>
Xilinx · Sep 7, 2023 · 9fde9bb · 9fde9bb
1 parent 92666f5
commit 9fde9bb
Show file tree

Hide file tree

Showing 35 changed files with 161 additions and 72 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -28,6 +28,39 @@ Unreleased
 Added
 ^^^^^
 
+* N/A
+
+Changed
+^^^^^^^
+
+* N/A
+
+Deprecated
+^^^^^^^^^^
+
+* N/A
+
+Removed
+^^^^^^^
+
+* N/A
+
+Fixed
+^^^^^
+
+* N/A
+
+Security
+^^^^^^^^
+
+* N/A
+
+:github:`0.4.0 <Xilinx/inference-server/releases/tag/v0.4.0>` - 2023-09-07
+--------------------------------------------------------------------------
+
+Added
+^^^^^
+
 * An example MLPerf app using the inference server API (:pr:`129`)
 * Google Benchmark for writing performance-tracking tests (:pr:`147`)
 * Custom memory storage classes in the memory pool (:pr:`166`)
@@ -37,7 +70,8 @@ Added
 * Tests with FP16 (:pr:`189` and :pr:`203`)
 * Versioned models (:pr:`190`)
 * Expand benchmarking with MLPerf app (:pr:`197`) and add to data to docs (:pr:`198`)
-
+* Custom environment configuration per test (:pr:`214`)
+* VCK5000 test (:pr:`214`)
 
 Changed
 ^^^^^^^
@@ -57,7 +91,7 @@ Changed
 * Close dynamically opened libraries (:pr:`186`)
 * Replace Jaeger exporter with OTLP (:pr:`187`)
 * Change STRING type to BYTES and shape type from uint64 to int64 (:pr:`190`)
-* Rename ONNX file to MXR correctly (:pr:`202`)
+* Include the correct tensor name in ModelMetadata in the XModel backend (:pr:`207`)
 
 Deprecated
 ^^^^^^^^^^
@@ -67,7 +101,7 @@ Deprecated
 Removed
 ^^^^^^^
 
-* N/A
+* Python 3.6 support (:pr:`215`)
 
 Fixed
 ^^^^^
@@ -78,6 +112,7 @@ Fixed
 * Fix building with different CMake options (:pr:`170`)
 * Fix wheel generation with vcpkg (:pr:`191`)
 * Load models at startup correctly (:pr:`195`)
+* Fix handling MIGraphX models with dots in the names (:pr:`202`)
 
 Security
 ^^^^^^^^

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -91,6 +91,12 @@ endif()
 
 list(APPEND VCPKG_MANIFEST_FEATURES "testing")
 
+# In CMake 3.27+, find_package uses <PACKAGE_NAME>_ROOT variables. We're using
+# AKS_ROOT in the environment currently.
+if(${CMAKE_VERSION} VERSION_GREATER "3.27")
+  cmake_policy(SET CMP0144 OLD)
+endif()
+
 # set the project name
 project(
   amdinfer

diff --git a/README.rst b/README.rst
@@ -38,14 +38,14 @@ The AMD Inference Server is integrated with the following libraries out of the g
 * TensorFlow and PyTorch models with `ZenDNN <https://developer.amd.com/zendnn/>`__ on CPUs (optimized for AMD CPUs)
 * ONNX models with `MIGraphX <https://github.com/ROCmSoftwarePlatform/AMDMIGraphX>`__ on AMD GPUs
 * XModel models with `Vitis AI <https://www.xilinx.com/products/design-tools/vitis/vitis-ai.html>`__ on AMD FPGAs
-* A graph of computation including as pre- and post-processing can be written using `AKS <https://github.com/Xilinx/Vitis-AI/tree/v3.0/src/AKS>`__ on AMD FPGAs for end-to-end inference
+* A graph of computation including as pre- and post-processing can be written using `AKS <https://github.com/Xilinx/Vitis-AI/tree/bbd45838d4a93f894cfc9f232140dc65af2398d1/src/AKS>`__ on AMD FPGAs for end-to-end inference
 
 Quick Start Deployment and Inference
 ------------------------------------
 
 The following example demonstrates how to deploy the server locally and run a sample inference.
 This example runs on the CPU and does not require any special hardware.
-You can see a more detailed version of this example in the `quickstart <https://xilinx.github.io/inference-server/main/quickstart_inference.html>`__.
+You can see a more detailed version of this example in the `quickstart <https://xilinx.github.io/inference-server/main/quickstart.html>`__.
 
 .. code-block:: bash
 
@@ -80,7 +80,7 @@ Learn more
 
 The documentation for the AMD Inference Server is available `online <https://xilinx.github.io/inference-server/>`__.
 
-Check out the quickstart guides online to help you get started based on your use case(s): `inference <https://xilinx.github.io/inference-server/main/quickstart_inference.html>`__, `deployment <https://xilinx.github.io/inference-server/main/quickstart_deployment.html>`__ and `development <https://xilinx.github.io/inference-server/main/quickstart_development.html>`__.
+Check out the `quickstart <https://xilinx.github.io/inference-server/main/quickstart.html>`__ online to help you get started.
 
 Support
 -------

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.4.0-dev
+0.4.0
diff --git a/docker/generate.py b/docker/generate.py
@@ -316,13 +316,13 @@ def get_xrm_xrt_packages(package_manager):
     if package_manager == "apt":
         return textwrap.dedent(
             """\
-            && wget --quiet -O xrt.deb https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.354_20.04-amd64-xrt.deb \\
+            && wget --quiet -O xrt.deb https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.418_20.04-amd64-xrt.deb \\
             && wget --quiet -O xrm.deb https://www.xilinx.com/bin/public/openDownload?filename=xrm_202220.1.5.212_20.04-x86_64.deb \\"""
         )
     elif package_manager == "yum":
         return textwrap.dedent(
             """\
-            && wget --quiet -O xrt.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.354_7.8.2003-x86_64-xrt.rpm \\
+            && wget --quiet -O xrt.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrt_202220.2.14.418_7.8.2003-x86_64-xrt.rpm \\
             && wget --quiet -O xrm.rpm https://www.xilinx.com/bin/public/openDownload?filename=xrm_202220.1.5.212_7.8.2003-x86_64.rpm \\"""
         )
     raise ValueError(f"Unknown base image type: {package_manager}")
@@ -576,8 +576,8 @@ def install_dev_packages(manager: PackageManager, core):
 
 
 def install_migraphx(manager: PackageManager, custom_backends):
-    migraphx_apt_repo = 'echo "deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/5.4.1/ ubuntu main" > /etc/apt/sources.list.d/rocm.list'
-    migraphx_yum_repo = '"[ROCm]\\nname=ROCm\\nbaseurl=https://repo.radeon.com/rocm/yum/5.4.1/\\nenabled=1\\ngpgcheck=1\\ngpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/rocm.repo'
+    migraphx_apt_repo = 'echo "deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/5.6.1/ ubuntu main" > /etc/apt/sources.list.d/rocm.list'
+    migraphx_yum_repo = '"[ROCm]\\nname=ROCm\\nbaseurl=https://repo.radeon.com/rocm/yum/5.6.1/\\nenabled=1\\ngpgcheck=1\\ngpgkey=https://repo.radeon.com/rocm/rocm.gpg.key" > /etc/yum.repos.d/rocm.repo'
 
     if manager.name == "apt":
         add_repo = (

diff --git a/docs/backends/vitis_ai.rst b/docs/backends/vitis_ai.rst
@@ -44,6 +44,7 @@ While not every model is tested on every FPGA, the Vitis AI backend has run at l
 
     Alveo,U250,DPUCADF8H
     Versal,VCK5000,DPUCVDX8H
+    Alveo,V70,DPUCV2DX8G
 
 Other devices and DPUs may also work but are currently untested.
 

diff --git a/docs/conf.py b/docs/conf.py
@@ -169,6 +169,9 @@ def hide_private_module(app, what, name, obj, options, signature, return_annotat
 
 # strip leading $ from bash code blocks
 copybutton_prompt_text = "$ "
+copybutton_here_doc_delimiter = "EOF"
+# selecting the literal block doesn't work to show the copy button correctly
+# copybutton_selector = ":is(div.highlight pre, pre.literal-block)"
 
 # raise a warning if a cross-reference cannot be found
 nitpicky = True
@@ -256,7 +259,7 @@ def hide_private_module(app, what, name, obj, options, signature, return_annotat
 
     html_context["languages"] = [("en", "/" + "inference-server/" + version + "/")]
 
-    versions = ["0.1.0", "0.2.0", "0.3.0"]
+    versions = ["0.1.0", "0.2.0", "0.3.0", "0.4.0"]
     versions.append("main")
     html_context["versions"] = []
     for version in versions:

diff --git a/docs/dependencies.rst b/docs/dependencies.rst
@@ -180,7 +180,7 @@ The following packages are installed from Github.
     :github:`protocolbuffers/protobuf`,3.19.4,BSD-3,Dynamically linked by amdinfer-server and Vitis libraries\ :superscript:`a 0`
     :github:`fpagliughi/sockpp`,e5c51b5,BSD-3,Dynamically linked by amdinfer-server :superscript:`a 0`
     :github:`gabime/spdlog`,1.8.2,MIT,Statically linked by amdinfer-server for logging\ :superscript:`a 0`
-    :github:`Xilinx/Vitis-AI`,3.0,Apache 2.0,VART is dynamically linked by amdinfer-server\ :superscript:`a 1`
+    :github:`Xilinx/Vitis-AI`,3.5,Apache 2.0,VART is dynamically linked by amdinfer-server\ :superscript:`a 1`
     :github:`wg/wrk`,4.1.0,modified Apache 2.0,Executable used for benchmarking amdinfer-server\ :superscript:`d 0`
 
 Others
@@ -203,8 +203,8 @@ The following packages are installed from Xilinx.
     :header: Name,Version,License,Usage
     :widths: auto
 
-    :xilinxDownload:`XRM <xrm_202120.1.3.29_18.04-x86_64.deb>`,1.3.29,Apache 2.0,Used for FPGA resource management\ :superscript:`a 1`
-    :xilinxDownload:`XRT <xrt_202120.2.12.427_18.04-amd64-xrt.deb>`,2.12.427,Apache 2.0,Used for communicating to the FPGA\ :superscript:`a 1`
+    :xilinxDownload:`XRM <xrm_202220.1.5.212_20.04-x86_64.deb>`,1.15.212,Apache 2.0,Used for FPGA resource management\ :superscript:`a 1`
+    :xilinxDownload:`XRT <xrt_202220.2.14.418_20.04-amd64-xrt.deb>`,2.14.418,Apache 2.0,Used for communicating to the FPGA\ :superscript:`a 1`
 
 AMD
 ^^^

diff --git a/docs/dry.rst b/docs/dry.rst
@@ -62,15 +62,15 @@ In this case, the endpoint is defined in the model's configuration file in the r
     .. code-tab:: console CPU
 
         # this image is not available on Dockerhub yet but you can build it yourself from the repository
-        $ docker pull amdih/serve:uif1.1_zendnn_amdinfer_0.4.0
+        $ docker pull amdih/serve:uif1.2_zendnn_amdinfer_0.4.0
 
     .. code-tab:: text GPU
 
         # this image is not available on Dockerhub yet but you can build it yourself from the repository
-        $ docker pull amdih/serve:uif1.1_migraphx_amdinfer_0.4.0
+        $ docker pull amdih/serve:uif1.2_migraphx_amdinfer_0.4.0
 
     .. code-tab:: console FPGA
 
         # this image is not available on Dockerhub yet but you can build it yourself from the repository
-        $ docker pull amdih/serve:uif1.1_vai_amdinfer_0.4.0
+        $ docker pull amdih/serve:uif1.2_vai_amdinfer_0.4.0
 -docker_pull_deployment_images
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -81,17 +81,19 @@ The CPU version has no special hardware requirements to run so you can always ru
 
     .. code-tab:: console FPGA
 
+        # this example assumes a U250. If you're using a different board, download the appropriate model for your board instead
         $ wget -O vitis.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=resnet_v1_50_tf-u200-u250-r2.5.0.tar.gz
         $ tar -xzf vitis.tar.gz "resnet_v1_50_tf/resnet_v1_50_tf.xmodel"
         $ mkdir -p ./model_repository/resnet50/1
         $ mv ./resnet_v1_50_tf/resnet_v1_50_tf.xmodel ./model_repository/resnet50/1/
 
-For the models used here, their corresponding ``config.toml`` should be placed in the chosen model repository (``./model_repository/resnet50/``):
+For the models used here, you can save their corresponding ``config.toml`` to the correct path with:
 
 .. tabs::
 
-    .. code-tab:: toml CPU
+    .. code-tab:: shell CPU
 
+        cat <<EOF > "./model_repository/resnet50/config.toml"
         name = "resnet50"
         platform = "tensorflow_graphdef"
 
@@ -104,9 +106,11 @@ For the models used here, their corresponding ``config.toml`` should be placed i
         name = "resnet_v1_50/predictions/Reshape_1"
         datatype = "FP32"
         shape = [1000]
+        EOF
 
-    .. code-tab:: text GPU
+    .. code-tab:: shell GPU
 
+        cat <<EOF > "./model_repository/resnet50/config.toml"
         name = "resnet50"
         platform = "onnx_onnxv1"
 
@@ -119,9 +123,11 @@ For the models used here, their corresponding ``config.toml`` should be placed i
         name = "output"
         datatype = "FP32"
         shape = [1000]
+        EOF
 
-    .. code-tab:: console FPGA
+    .. code-tab:: shell FPGA
 
+        cat <<EOF > "./model_repository/resnet50/config.toml"
         name = "resnet50"
         platform = "vitis_xmodel"
 
@@ -134,6 +140,7 @@ For the models used here, their corresponding ``config.toml`` should be placed i
         name = "output"
         datatype = "INT8"
         shape = [1000]
+        EOF
 
 The name must match the name of the model directory: it defines the endpoint that will be used for inference.
 The platform identifies the type of the model and determines the file extension of the model file.
@@ -173,15 +180,15 @@ The flags used in this sample command are:
 
     .. code-tab:: console CPU
 
-        $ docker run -d --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.1_zendnn_amdinfer_0.4.0
+        $ docker run -d --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_zendnn_amdinfer_0.4.0
 
     .. code-tab:: console GPU
 
-        $ docker run -d --device /dev/kfd --device /dev/dri --volume $(pwd)/model_repository:/mnt/models:rw --publish 127.0.0.1::8998 --publish 127.0.0.1::50051 amdih/serve:uif1.1_migraphx_amdinfer_0.4.0
+        $ docker run -d --device /dev/kfd --device /dev/dri --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_migraphx_amdinfer_0.4.0
 
     .. code-tab:: console FPGA
 
-        $ docker run -d --device /dev/dri --device /dev/xclmgmt<id> --volume $(pwd)/model_repository:/mnt/models:rw --publish 127.0.0.1::8998 --publish 127.0.0.1::50051 amdih/serve:uif1.1_vai_amdinfer_0.4.0
+        $ docker run -d --device /dev/dri --device /dev/xclmgmt<id> --volume $(pwd)/model_repository:/mnt/models:rw --net=host amdih/serve:uif1.2_vai_amdinfer_0.4.0
 
 The endpoints for each model will be the name of the model in the ``config.toml``, which should match the name of the parent directory in the model repository.
 In this example, it would be "resnet50".
@@ -195,7 +202,7 @@ Server deployment summary
 After setting up the server as above, you have the following information:
 
 * IP address: 127.0.0.1 since the server is running on the same machine where you will run the inference
-* Ports: 8998 and 50051 for HTTP and gRPC, respectively. If you used ``--publish``, your port numbers may be different and you can see what they are using ``docker ps``.
+* Ports: 8998 and 50051 for HTTP and gRPC, respectively. If you used ``--publish`` in the ``docker run`` command to remap the ports, your port numbers may be different and you can see what they are using ``docker ps``.
 * Endpoint: "resnet50" since that is what the model name was used in the model repository and in the configuration file
 
 The rest of this example will use these values in the sample code so substitute your own values if they are different.
@@ -239,21 +246,22 @@ These results are post-processed and the top 5 labels for the image are printed.
         .. parsed-literal::
 
             $ wget :amdinferRawFull:`examples/resnet50/tfzendnn.py`
-            $ python3 tfzendnn.py --ip 127.0.0.1 --grpc-port 50051 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
+            $ python3 tfzendnn.py --ip 127.0.0.1 --grpc-port 50051 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
 
     .. group-tab:: GPU
 
         .. parsed-literal::
 
             $ wget :amdinferRawFull:`examples/resnet50/migraphx.py`
-            $ python3 migraphx.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
+            # This will take some time initially as MIGraphX will compile the ONNX model to MXR
+            $ python3 migraphx.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
 
     .. group-tab:: FPGA
 
         .. parsed-literal::
 
             $ wget :amdinferRawFull:`examples/resnet50/vitis.py`
-            $ python3 vitis.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt
+            $ python3 vitis.py --ip 127.0.0.1 --http-port 8998 --endpoint resnet50 --image ./dog-3619020_640.jpg --labels ./imagenet_classes.txt --wait
 
 After running the script, you should get output similar to the following.
 The exact output may be slightly different depending on whether you used CPU, GPU or FPGA versions of the example.

diff --git a/examples/resnet50/migraphx.cpp b/examples/resnet50/migraphx.cpp
@@ -198,7 +198,9 @@ int main(int argc, char* argv[]) {
     amdinfer::HttpClient client{server_addr};
 
     std::optional<amdinfer::Server> server;
-    if (args.ip == "127.0.0.1" && !client.serverLive()) {
+    if (args.wait) {
+      // if wait is true, skip ahead to waiting for the server to become ready
+    } else if (args.ip == "127.0.0.1" && !client.serverLive()) {
       std::cout << "No server detected. Starting locally...\n";
       server.emplace();
       server.value().startHttp(args.http_port);

diff --git a/examples/resnet50/migraphx.py b/examples/resnet50/migraphx.py
@@ -181,8 +181,11 @@ def main(args):
 
     server_addr = f"http://{args.ip}:{args.http_port}"
     client = amdinfer.HttpClient(server_addr)
+    if args.wait:
+        # if wait is true, skip ahead to waiting for the server to become ready
+        pass
     # start it locally if it doesn't already up if the IP address is the localhost
-    if args.ip == "127.0.0.1" and not client.serverLive():
+    elif args.ip == "127.0.0.1" and not client.serverLive():
         print("No server detected. Starting locally...")
         server = amdinfer.Server()
         server.startHttp(args.http_port)

diff --git a/examples/resnet50/resnet.py b/examples/resnet50/resnet.py
@@ -111,6 +111,12 @@ def parse_args():
         help="Name of the output node",
     )
 
+    parser.add_argument(
+        "--wait",
+        action="store_true",
+        help="Don't start the server automatically and wait for it indefinitely",
+    )
+
     args = parser.parse_args()
 
     if (not args.image) or (not args.labels):

diff --git a/examples/resnet50/resnet50.hpp b/examples/resnet50/resnet50.hpp
@@ -45,6 +45,7 @@ struct Args {
   int output_classes = kOutputClasses;
   std::string input_node;
   std::string output_node;
+  bool wait;
 };
 
 inline Args parseArgs(int argc, char** argv) {
@@ -80,6 +81,8 @@ inline Args parseArgs(int argc, char** argv) {
       cxxopts::value(args.top))
     ("output-classes", "Number of output classes for this model",
       cxxopts::value(args.output_classes))
+    ("wait", "Don't start the server automatically and wait for it indefinitely",
+      cxxopts::value(args.wait))
     ("help", "Print help");
     // clang-format on
 

diff --git a/examples/resnet50/tfzendnn.cpp b/examples/resnet50/tfzendnn.cpp
@@ -190,7 +190,9 @@ int main(int argc, char* argv[]) {
 
     std::optional<amdinfer::Server> server;
     // +start protocol
-    if (args.ip == "127.0.0.1" && !client.serverLive()) {
+    if (args.wait) {
+      // if wait is true, skip ahead to waiting for the server to become ready
+    } else if (args.ip == "127.0.0.1" && !client.serverLive()) {
       std::cout << "No server detected. Starting locally...\n";
       server.emplace();
       server.value().startGrpc(args.grpc_port);

diff --git a/examples/resnet50/tfzendnn.py b/examples/resnet50/tfzendnn.py
@@ -173,8 +173,11 @@ def main(args):
 
     server_addr = f"{args.ip}:{args.grpc_port}"
     client = amdinfer.GrpcClient(server_addr)
+    if args.wait:
+        # if wait is true, skip ahead to waiting for the server to become ready
+        pass
     # start it locally if it doesn't already up if the IP address is the localhost
-    if args.ip == "127.0.0.1" and not client.serverLive():
+    elif args.ip == "127.0.0.1" and not client.serverLive():
         print("No server detected. Starting locally...")
         server = amdinfer.Server()
         server.startGrpc(args.grpc_port)