Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python API Beta #299

Merged
merged 91 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
8848ae8
initial commit of python wrapper api
nnshah1 Dec 4, 2023
1ce4899
adding initial api level test
nnshah1 Dec 4, 2023
bd44a33
adding api unit test
nnshah1 Dec 4, 2023
5f2c3c0
update to make cupy optional
nnshah1 Dec 4, 2023
76baf08
updated with formatting fixes
nnshah1 Dec 5, 2023
8b52407
documentation updates
nnshah1 Dec 5, 2023
fe0368a
updated documentation
nnshah1 Dec 5, 2023
513abaa
updates
nnshah1 Dec 5, 2023
9e1d813
updating documentation
nnshah1 Dec 5, 2023
f1fe520
updating incremental typing improvements
nnshah1 Dec 6, 2023
318ebe9
updates for latest - fix cancel
nnshah1 Dec 7, 2023
02b4125
updated cmake commands for consistency
nnshah1 Dec 7, 2023
759dabf
updated for based on coding style
nnshah1 Dec 7, 2023
b5782e7
adding parameter setting and error handling
nnshah1 Dec 7, 2023
9efd589
copyright headers and tweaks to makefile
nnshah1 Dec 8, 2023
fe3cb7a
added metric
nnshah1 Dec 8, 2023
b9b5885
adding support for blocking unload
nnshah1 Dec 9, 2023
68ecf20
updating naming
nnshah1 Dec 9, 2023
e16d50d
Merge branch 'main' into nnshah1-python-api
nnshah1 Dec 17, 2023
8594e4a
updated to clean up and make reexport explicit and support defaultmem…
nnshah1 Dec 18, 2023
29f7af5
added addiitional types for dlpack to triton data type
nnshah1 Dec 19, 2023
d1ce2f3
Adding type hints for c extension
Jan 2, 2024
5a27d81
incremental updates for testing [WIP]
Jan 2, 2024
3f29f9d
updated to remove unnecessary prefix
nnshah1 Jan 2, 2024
ac5d6e9
incremental refactoring to create dlpack from tensor
nnshah1 Jan 3, 2024
4afc2f1
refactor memory allocation - testing updates in progress
nnshah1 Jan 5, 2024
be0ddb0
update to avoid seting cuda ipc handle until supported
nnshah1 Jan 5, 2024
dcd432d
update to ignore stream synchronization for now
nnshah1 Jan 5, 2024
bd3411a
updated to catch and print on exception in capsule deleter
nnshah1 Jan 5, 2024
7e161eb
updates for providing array module support
nnshah1 Jan 5, 2024
5b03ef0
updates for formatting
nnshah1 Jan 5, 2024
8b91235
minor fix just for formatting
nnshah1 Jan 5, 2024
04dad3f
add version string
nnshah1 Jan 6, 2024
756a802
guard against uninstalled package
nnshah1 Jan 6, 2024
021841d
removing output_array_module to simplify interface
nnshah1 Jan 6, 2024
3f85f38
normalizing names of creation methods
nnshah1 Jan 7, 2024
b474ffd
updated error handling and removed from and to ndarray
nnshah1 Jan 7, 2024
8e2e40a
added to host method for convenience
nnshah1 Jan 7, 2024
d93b7df
renaming to be consistent in methods moving to and from binding level
nnshah1 Jan 7, 2024
8d06e4e
tweaking for linting
nnshah1 Jan 7, 2024
f8b4dea
updating names of unused arguments with _
nnshah1 Jan 7, 2024
e48d2ad
update to add numpy and cupy as required and optional dependencies
nnshah1 Jan 8, 2024
15c3822
splitting api into multiple modules for ease of documentation and nav…
nnshah1 Jan 8, 2024
19c8b8a
updating with documentation for model
nnshah1 Jan 8, 2024
99c049f
updated with documentation
nnshah1 Jan 8, 2024
81d67ed
updating with documentation for request
nnshah1 Jan 8, 2024
c77978d
adding tensor with documentation
nnshah1 Jan 8, 2024
2d3fb24
updating and seperating allocators into seperate module
nnshah1 Jan 8, 2024
6152fe2
updates for consistency
nnshah1 Jan 8, 2024
9464740
documentation in progress
nnshah1 Jan 9, 2024
edbda0f
updating imports
nnshah1 Jan 9, 2024
c872ea2
updating imports
nnshah1 Jan 9, 2024
d446ba0
updated documentation
nnshah1 Jan 9, 2024
c3ab92f
updating init.py to refer to new modules
nnshah1 Jan 9, 2024
4f06899
updated to make Tensor.from_object private and add Tensor.from_bytes_…
nnshah1 Jan 9, 2024
e04fa67
updating imports to avoid circular imports and change to absolut
nnshah1 Jan 9, 2024
862d0c7
begin refactor of tests
nnshah1 Jan 9, 2024
8d2c3f8
updates for naming and remove tensor name from allocator api
nnshah1 Jan 9, 2024
d1394a5
updated naming and example
nnshah1 Jan 9, 2024
e7d441d
updated with basic tests
nnshah1 Jan 9, 2024
45cdf46
updating with basic test model
nnshah1 Jan 9, 2024
dc21fa9
updated
nnshah1 Jan 9, 2024
7c87bbc
removing TRITONSERVER_Server objects from public constructors
nnshah1 Jan 10, 2024
8e9247c
adding dlpack stream synchronization support based on discussion
nnshah1 Jan 10, 2024
6c2ad73
adding additional testing support for data types
nnshah1 Jan 10, 2024
d9696b9
adding test for string array (WIP)
nnshah1 Jan 10, 2024
a8fd94c
change default raise on error
nnshah1 Jan 10, 2024
ca117ce
updated with allocation tests and additional dlpack synchronization
nnshah1 Jan 10, 2024
1e121fe
updated removing state tests
nnshah1 Jan 10, 2024
ee73821
update copyright
nnshah1 Jan 10, 2024
f406ae2
update test model to use model config parameters
nnshah1 Jan 10, 2024
47c2d3c
Update python/tritonserver/_api/_allocators.py
nnshah1 Jan 10, 2024
5c7bfe3
Update python/tritonserver/_api/_allocators.py
nnshah1 Jan 10, 2024
250c2f9
update for consistency
nnshah1 Jan 10, 2024
8d25b00
Update python/tritonserver/_api/_model.py
nnshah1 Jan 10, 2024
333e08b
Update python/tritonserver/_api/_dlpack.py
nnshah1 Jan 10, 2024
20b9aa5
Update python/tritonserver/_api/_dlpack.py
nnshah1 Jan 10, 2024
3e85540
Update python/tritonserver/_api/_model.py
nnshah1 Jan 10, 2024
e983404
Update python/tritonserver/_api/_model.py
nnshah1 Jan 10, 2024
9e1cc13
Update python/tritonserver/_api/_request.py
nnshah1 Jan 10, 2024
1d60e9b
removing placeholders for examples
nnshah1 Jan 10, 2024
121808f
removing paramter from config file - is passed in via explicit load
nnshah1 Jan 10, 2024
8f6dce1
Merge branch 'main' into nnshah1-python-api
nnshah1 Jan 10, 2024
fa43784
Update python/test/test_api.py
nnshah1 Jan 10, 2024
befe2a3
adding model load retry count to options
nnshah1 Jan 10, 2024
0d95d7b
adding type hint for model load retry count
nnshah1 Jan 10, 2024
cbf37e0
changing to copy over directories instead of using file glob
nnshah1 Jan 11, 2024
750c655
added test dependencies to wheel, removed typing_extensions
nnshah1 Jan 11, 2024
8c51e67
updated to skip tests requiring torch
nnshah1 Jan 11, 2024
ed78404
update for formatting
nnshah1 Jan 11, 2024
731a722
updating description to Triton Inference Server In-Process Python API
nnshah1 Jan 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions python/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ add_subdirectory(tritonserver)
file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/TRITON_VERSION ${TRITON_VERSION})
configure_file(../LICENSE LICENSE.txt COPYONLY)
configure_file(setup.py setup.py @ONLY)
file(COPY test/test_binding.py DESTINATION ./test/.)
file(GLOB TESTS test/*.py)
file(COPY ${TESTS} DESTINATION ./test/.)

set(WHEEL_DEPENDS
${CMAKE_CURRENT_BINARY_DIR}/TRITON_VERSION
Expand Down Expand Up @@ -58,12 +59,20 @@ add_custom_target(
"${wheel_stamp_file}"
)


# Wheel
file(GLOB WHEEL "${CMAKE_CURRENT_BINARY_DIR}/generic/triton*.whl")
install(
CODE "file(GLOB _Wheel \"${CMAKE_CURRENT_BINARY_DIR}/generic/triton*.whl\")"
CODE "file(INSTALL \${_Wheel} DESTINATION \"${CMAKE_INSTALL_PREFIX}/python\")"
FILES
${WHEEL}
DESTINATION "${CMAKE_INSTALL_PREFIX}/python"
)

# Test

# Tests
file(GLOB TESTS "${CMAKE_CURRENT_BINARY_DIR}/test/*.py")
install(
CODE "file(INSTALL ${CMAKE_CURRENT_BINARY_DIR}/test/test_binding.py DESTINATION \"${CMAKE_INSTALL_PREFIX}/python\")"
FILES
${TESTS}
DESTINATION "${CMAKE_INSTALL_PREFIX}/python"
)
6 changes: 4 additions & 2 deletions python/build_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,11 +93,13 @@ def sed(pattern, replace, source, dest=None):
print("=== Building in: {}".format(os.getcwd()))
print("=== Using builddir: {}".format(FLAGS.whl_dir))
print("Adding package files")

mkdir(os.path.join(FLAGS.whl_dir, "tritonserver"))
shutil.copy("tritonserver/__init__.py", os.path.join(FLAGS.whl_dir, "tritonserver"))

# Type checking marker file indicating support for type checkers.
# https://peps.python.org/pep-0561/
shutil.copy("tritonserver/py.typed", os.path.join(FLAGS.whl_dir, "tritonserver"))
cpdir("tritonserver/_c", os.path.join(FLAGS.whl_dir, "tritonserver", "_c"))
cpdir("tritonserver/_api", os.path.join(FLAGS.whl_dir, "tritonserver", "_api"))
PYBIND_LIB = os.path.basename(FLAGS.binding_path)
shutil.copyfile(
FLAGS.binding_path,
Expand Down
15 changes: 13 additions & 2 deletions python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,23 @@ def get_tag(self):
data_files = [
("", ["LICENSE.txt"]),
]
platform_package_data = [os.environ["TRITON_PYBIND"]]

# Type checking marker file indicating support for type checkers.
# https://peps.python.org/pep-0561/
# Type hints for c extension generated by mypy
platform_package_data = [
os.environ["TRITON_PYBIND"],
"py.typed",
"_c/__init__.pyi",
"_c/triton_bindings.pyi",
]

setup(
name="tritonserver",
version=VERSION,
author="NVIDIA Inc.",
author_email="[email protected]",
description="Python API of the Triton In-Process Server",
description="Triton Inference Server Python API",
license="BSD",
url="https://developer.nvidia.com/nvidia-triton-inference-server",
classifiers=[
Expand All @@ -95,4 +104,6 @@ def get_tag(self):
zip_safe=False,
cmdclass={"bdist_wheel": bdist_wheel},
data_files=data_files,
install_requires=["numpy"],
extras_require={"GPU": ["cupy-cuda12x"], "all": ["cupy-cuda12x"]},
)
252 changes: 252 additions & 0 deletions python/test/test_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# Copyright 2023-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import asyncio
import json
import queue
import time
import unittest

import numpy
import pytest
import tritonserver

try:
import cupy
except ImportError:
cupy = None

try:
import torch
except ImportError:
torch = None

server_options = tritonserver.Options(
server_id="TestServer",
model_repository="/workspace/test/test_api_models",
log_verbose=0,
log_error=True,
exit_on_error=True,
strict_model_config=False,
model_control_mode=tritonserver.ModelControlMode.EXPLICIT,
)


class ModelTests(unittest.TestCase):
def test_create_request(self):
server = tritonserver.Server(server_options).start(wait_until_ready=True)

request = server.models()["test"].create_request()

request = tritonserver.InferenceRequest(server.model("test"))


class AllocatorTests(unittest.TestCase):
def test_allocate_on_cpu_and_reshape(self):
allocator = tritonserver.default_memory_allocators[tritonserver.MemoryType.CPU]

memory_buffer = allocator.allocate(
memory_type=tritonserver.MemoryType.CPU, memory_type_id=0, size=200
)

cpu_array = memory_buffer.owner

self.assertEqual(memory_buffer.size, 200)

fp32_size = int(memory_buffer.size / 4)

tensor = tritonserver.Tensor(
tritonserver.DataType.FP32, shape=[fp32_size], memory_buffer=memory_buffer
)

cpu_fp32_array = numpy.from_dlpack(tensor)
self.assertEqual(cpu_array.ctypes.data, cpu_fp32_array.ctypes.data)
self.assertEqual(cpu_fp32_array.dtype, numpy.float32)
self.assertEqual(cpu_fp32_array.nbytes, 200)

torch_fp32_tensor = torch.from_dlpack(tensor)
self.assertEqual(torch_fp32_tensor.dtype, torch.float32)
self.assertEqual(torch_fp32_tensor.data_ptr(), cpu_array.ctypes.data)
self.assertEqual(torch_fp32_tensor.nbytes, 200)

@pytest.mark.skipif(cupy is None, reason="Skipping gpu memory, cpupy not installed")
def test_allocate_on_gpu_and_reshape(self):
if cupy is None:
return

allocator = tritonserver.default_memory_allocators[tritonserver.MemoryType.GPU]

memory_buffer = allocator.allocate(
memory_type=tritonserver.MemoryType.GPU, memory_type_id=0, size=200
)

gpu_array = memory_buffer.owner

gpu_array = cupy.empty([10, 20], dtype=cupy.uint8)
memory_buffer = tritonserver.MemoryBuffer.from_dlpack(gpu_array)

self.assertEqual(memory_buffer.size, 200)

fp32_size = int(memory_buffer.size / 4)

tensor = tritonserver.Tensor(
tritonserver.DataType.FP32, shape=[fp32_size], memory_buffer=memory_buffer
)

gpu_fp32_array = cupy.from_dlpack(tensor)
self.assertEqual(
gpu_array.__cuda_array_interface__["data"][0],
gpu_fp32_array.__cuda_array_interface__["data"][0],
)
self.assertEqual(gpu_fp32_array.dtype, cupy.float32)
self.assertEqual(gpu_fp32_array.nbytes, 200)

torch_fp32_tensor = torch.from_dlpack(tensor)
self.assertEqual(torch_fp32_tensor.dtype, torch.float32)
self.assertEqual(
torch_fp32_tensor.data_ptr(), gpu_array.__cuda_array_interface__["data"][0]
)
self.assertEqual(torch_fp32_tensor.nbytes, 200)


class TensorTests(unittest.TestCase):
@pytest.mark.skipif(cupy is None, reason="Skipping gpu memory, cupy not installed")
def test_cpu_to_gpu(self):
if cupy is None:
return
cpu_array = numpy.random.rand(1, 3, 100, 100).astype(numpy.float32)
cpu_tensor = tritonserver.Tensor.from_dlpack(cpu_array)
gpu_tensor = cpu_tensor.to_device("gpu:0")
gpu_array = cupy.from_dlpack(gpu_tensor)

self.assertEqual(gpu_array.device, cupy.cuda.Device(0))

numpy.testing.assert_array_equal(cpu_array, gpu_array.get())

memory_buffer = tritonserver.MemoryBuffer.from_dlpack(gpu_array)

self.assertEqual(
gpu_array.__cuda_array_interface__["data"][0], memory_buffer.data_ptr
)

@pytest.mark.skipif(
torch is None, reason="Skipping gpu memory, torch not installed"
)
@pytest.mark.skipif(cupy is None, reason="Skipping gpu memory, cupy not installed")
def test_gpu_tensor_from_dl_pack(self):
if cupy is None or torch is None:
return
cupy_array = cupy.ones([100]).astype(cupy.float64)
tensor = tritonserver.Tensor.from_dlpack(cupy_array)
torch_tensor = torch.from_dlpack(cupy_array)

self.assertEqual(torch_tensor.data_ptr(), tensor.data_ptr)
self.assertEqual(torch_tensor.nbytes, tensor.size)
self.assertEqual(torch_tensor.__dlpack_device__(), tensor.__dlpack_device__())

@pytest.mark.skipif(torch is None, reason="Skipping test, torch not installed")
def test_tensor_from_numpy(self):
cpu_array = numpy.random.rand(1, 3, 100, 100).astype(numpy.float32)
tensor = tritonserver.Tensor.from_dlpack(cpu_array)
torch_tensor = torch.from_dlpack(tensor)
numpy.testing.assert_array_equal(torch_tensor.numpy(), cpu_array)
self.assertEqual(torch_tensor.data_ptr(), cpu_array.ctypes.data)


class ServerTests(unittest.TestCase):
def test_not_started(self):
server = tritonserver.Server()
with self.assertRaises(tritonserver.InvalidArgumentError):
server.ready()

def test_invalid_option_type(self):
server = tritonserver.Server(server_id=1)
with self.assertRaises(TypeError):
server.start()

server = tritonserver.Server(model_repository=1)
with self.assertRaises(TypeError):
server.start()

def test_invalid_repo(self):
with self.assertRaises(tritonserver.InternalError):
tritonserver.Server(model_repository="foo").start()

def test_ready(self):
server = tritonserver.Server(server_options).start()
self.assertTrue(server.ready())


class InferenceTests(unittest.TestCase):
def test_basic_inference(self):
server = tritonserver.Server(server_options).start(wait_until_ready=True)

self.assertTrue(server.ready())

server.load(
"test",
{
"config": json.dumps(
{
"backend": "python",
"parameters": {"decoupled": {"string_value": "False"}},
}
)
},
)

fp16_input = numpy.random.rand(1, 100).astype(dtype=numpy.float16)

for response in server.model("test").infer(
inputs={"fp16_input": fp16_input},
output_memory_type="cpu",
raise_on_error=True,
):
fp16_output = numpy.from_dlpack(response.outputs["fp16_output"])
numpy.testing.assert_array_equal(fp16_input, fp16_output)

for response in server.model("test").infer(
inputs={"fp16_input": fp16_input},
output_memory_type="gpu",
):
fp16_output = cupy.from_dlpack(response.outputs["fp16_output"])
self.assertEqual(fp16_input[0][0], fp16_output[0][0])

for response in server.model("test").infer(
inputs={"string_input": [["hello"]]},
output_memory_type="gpu",
):
text_output = response.outputs["string_output"].to_string_array()
self.assertEqual(text_output[0][0], "hello")

for response in server.model("test").infer(
inputs={"string_input": tritonserver.Tensor.from_string_array([["hello"]])},
output_memory_type="gpu",
):
text_output = response.outputs["string_output"].to_string_array()
text_output = response.outputs["string_output"].to_string_array()
self.assertEqual(text_output[0][0], "hello")
server.stop()
Loading