Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Python binding of Triton server C API #248

Closed
wants to merge 7 commits into from

Conversation

GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Aug 28, 2023

This PR provides the low level binding for Python user to interact with Triton library within the same process, however, this binding is not intended for Python user to use directly but to enable the development of the Python wrapper. Python user should use the wrapper mentioned because it encapsualtes many concepts that are not intuitive to Python user but inevitably leaked into the binding.

Most part of the binding is straight forward mapping of the C API, however, the PyResponseAllocator, PyTrace, PyInferenceRequest and PyInferenceResponse requires extra attention because they are objects that involves callbacks and ownership transfer. Also, below is the snippet from .cc file to help reviewers navigate things that are different from the C equivalent of the binding.

// This binding is merely used to map Triton C API into Python equivalent,
// and therefore, the naming will be the same as the one used in corresponding
// sections. However, there are a few exceptions to better transit to Python:
// Structs:
//  * Triton structs are encapsulated in a thin wrapper to isolate raw pointer
//    operations which is not supported in pure Python. A thin 'PyWrapper' base
//    class is defined with common utilities
//  * Trival getters and setters are grouped to be a Python class property.
//    However, this creates asymmetry that some APIs are called like function
//    while some like member variables. So I am open to expose getter / setter
//    if it may be more intuitive.
//  * The wrapper is only served as communication between Python and C, it will
//    be unwrapped when control reaches C API and the C struct will be wrapped
//    when control reaches Python side. Python binding user should respect the
//    "ownership" and lifetime of the wrapper in the same way as described in
//    the C API. Python binding user must not assume the same C struct will
//    always be referred through the same wrapper object.
// Enums:
//  * In C API, the enum values are prefixed by the enum name. The Python
//    equivalent is an enum class and thus the prefix is removed to avoid
//    duplication, i.e. Python user may specify a value by
//    'TRITONSERVER_ResponseCompleteFlag.FINAL'.
// Functions / Callbacks:
//  * Output parameters are converted to return value. APIs that return an error
//    will be thrown as an exception. The same applies to callbacks.
//  ** Note that in the C API, the inference response may carry an error object
//     that represent an inference failure. The equivalent Python API will raise
//     the corresponding exception if the response contains error object.
//  * The function parameters and return values are exposed in Python style,
//    for example, object pointer becomes py::object, C array and length
//    condenses into Python array.

CMakeLists.txt Show resolved Hide resolved
python/binding/tritonserver_pybind.cc Show resolved Hide resolved
// forward declaration
class PyServer;

class PyInferenceRequest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to breakdown this file to one class per file (e.g. PyInferenceRequest, PyServerOptions, ...)?


protected:
TritonStruct* triton_object_{nullptr};
bool owned_{false};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether it is possible to do the same thing with shared_ptr and custom deleters. For example,

class PyParameter {
  ...
  std::shared_ptr<TRITONSERVER_Parameter, TRITONSERVER_ParameterDelete> pointer_;
};

We would have to allow COPY_AND_ASSIGN for this to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may work for objects that doesn't involve release of ownership, but for request object, user may perform the following and cause TRITONSERVER_InferenceRequest being deleted:

req = triton_bindings.TRITONSERVER_InferenceRequest(...) # and initialize
req_2 = req
server.infer_async(req)
del req
del req_2

In current implementation of infer_async, it informs req that it no longer owns the Triton object, but such information is not passed to req_2 and it will delete the Triton object while it is needed for inference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think in Python it is just reference counting the same object so req_2 and req should be in sync. Although, just because I am less familiar with binding code in general,I still prefer to keep DISALLOW_COPY_AND_ASSIGN to make sure the ownership is controlled within c++ side.

python/binding/tritonserver_pybind.cc Show resolved Hide resolved
python/binding/tritonserver_pybind.cc Show resolved Hide resolved
@GuanLuo GuanLuo changed the title Python binding of Triton server C API [DO NOT MERGE] Python binding of Triton server C API Sep 13, 2023
@GuanLuo GuanLuo closed this Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants