-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python binding of Triton server C API #265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mostly looks good. Will take a closer look at the testing/bindings in a follow up review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall!
It doesn't have to be part of this PR but It would be great to have a valgrind test to make sure that the common functions (e.g., async_infer, model_load, tracing) do not have a memory leak.
if trace.id not in user_object: | ||
user_object[trace.id] = [] | ||
|
||
# not owning trace, so must read property out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that if the user stores the trace object it will become invalid?
For example, if I used user_object[trace.id]=trace would it result in segfault if I access trace after returning from callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the current implementation it will be invalid as the py::object is passed as reference to a local variable (wrapper of Triton trace).
To make your example valid, what we can do is to store the trace wrapper in the trace callback resource (seen_traces
) when seen and remove from it when release.
server.infer_async(request, trace) | ||
|
||
# [FIXME] WAR due to trace lifecycle is tied to response in Triton core, | ||
# trace reference should drop on response send.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, this WAR is mainly for testing purposes (i.e., to make sure that the trace object really gets deleted)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the coupling is within core logic and we need to make change there to decouple the lifecycle. This is not obvious in direct C API usage because users typically delete response on response complete callback which triggers trace release callback, in Python the time of deletion is less-deterministic and can cause confusion when trace release is being expected.
self.assertEqual(len(res.output_classification_label(0, 1)), 0) | ||
# [FIXME] keep alive behavior is not established between response | ||
# and server, so must explicitly handle the destruction order for now. | ||
del res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if response is not deleted here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triton response holds a shared pointer of the model and will cause server not exiting properly (waiting for model to fully unload). Not establishing keep_alive
makes Python garbage collects server before response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the PR due to time constraints but I think we should address this since it affects basic usage of the low-level Python bindings.
version=VERSION, | ||
author="NVIDIA Inc.", | ||
author_email="[email protected]", | ||
description="Python API of the Triton In-Process Server", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Python bindings for TritonServer C-API"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this describes the Python package tritonserver
? Python binding is meant to be lower level and not directly exposed.
python/test/test_binding.py
Outdated
# wrap in lambda function to avoid early evaluation that raises | ||
# exception before assert | ||
self.assertRaises(triton_bindings.TritonError, | ||
lambda: request.correlation_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this raising an exception because request.correlation_id_string was assigned before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in core it checks the type of stored correlation id and returns error if mismatch
* Add Python binding for Triton server API * Use uintptr_t for pointer. Address exception name. Interact with GIL * Fix lifetime dependency between request and server * Better variable name. Pass by reference to avoid extra copy construct * Add wheel build script. Address comment * Format * Avoid core referring to PyBind in backend build * Using generator expression to build on Windows * Specify rpath for runtime library lookup. Fix bug * Address comment. Format
* Add Python binding for Triton server API * Use uintptr_t for pointer. Address exception name. Interact with GIL * Fix lifetime dependency between request and server * Better variable name. Pass by reference to avoid extra copy construct * Add wheel build script. Address comment * Format * Avoid core referring to PyBind in backend build * Using generator expression to build on Windows * Specify rpath for runtime library lookup. Fix bug * Address comment. Format
Duplicated from #248 and developed on top of it, because the original PR is targeting the Python wrapper branch which is not finalized and merged into main branch. Please refer to #248 for context of the discussing below.
Remaining item for this PR:
Most of the comment in #248 has been addressed, but there are unresolved discussions and they will be resolved in separated PRs:
PyXXX
objects (discussion), because for most of the objects, the ownership is fixed.New issues to be resolved in separated PRs:
PyTritonXXXCallback
, which doesn't take into the account that the request can be re-used without re-configuring the callback setting. In that case, the callback resource is not valid for the sequential inferences.del res
in testing to make sure server exit properly). Ideally such dependency is handled internally like the dependency between request and server.