How to access inputs in TRITONSERVER_InferenceRequest #5499

avickars · 2023-03-13T18:01:54Z

avickars
Mar 13, 2023

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

We would like to access the inputs we put on an inference request inside the inference request release callback function. However, there doesn't appear to be any documentation on how to do this (i.e. in tritonserver.h we can see that TRITONSERVER_InferenceRequest is defined on line 56 but we have no idea what the struct members are, so we don't know how to access the inputs). To be specific, we want to perform memory de-allocation here:

server/src/simple.cc

Line 249 in db3d08b

TRITONSERVER_InferenceRequest* request, const uint32_t flags, void* userp)

.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Some documentation on how to do above (i.e. what are the members of TRITONSERVER_InferenceRequest struct)

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

In the mean time, we are have a structure to hold the locations of request inputs, and can perform the deallocation using that.

Additional context
Add any other context or screenshots about the feature request here.

Answered by dyastremsky

Mar 13, 2023

You typically wouldn't modify that yourself. The structs are defined in the tritonserver.h header file, but they're more like aliases you use in calls to the C API. Those calls are detailed in the header file, like the request delete call here that you'd use for that memory deallocation.

You should feel free to use the more complex servers as references for how to use the API as well. They are in the same folder, like multi_server and GRPC server, which have that delete call and others.

View full answer

dyastremsky · 2023-03-13T23:07:34Z

dyastremsky
Mar 13, 2023
Collaborator

You typically wouldn't modify that yourself. The structs are defined in the tritonserver.h header file, but they're more like aliases you use in calls to the C API. Those calls are detailed in the header file, like the request delete call here that you'd use for that memory deallocation.

You should feel free to use the more complex servers as references for how to use the API as well. They are in the same folder, like multi_server and GRPC server, which have that delete call and others.

0 replies

avickars · 2023-03-14T15:18:56Z

avickars
Mar 14, 2023
Author

Thanks for the response @dyastremsky ! What your saying makes total sense and I agree, the only issue there is TRITONSERVER_InferenceRequestDelete we aren't defining anywhere how the memory de-allocation is done so my assumption is that it isn't doing it? Or does it do the memory deallocation for us?

For instance in the https://github.com/triton-inference-server/core/blob/24e2d3afd9ceb7f328cbe2b754ba5d4e144622fb/include/triton/core/tritonserver.h#L538 we define how the response memory allocation/deallocation is performed. I didn't see anything like that for TRITONSERVER_InferenceRequestDelete so I assumed it only deallocates the request object itself (i.e. a TRITONSERVER_InferenceRequest) but not not the memory allocated for the input.

Could you clarify if the TRITONSERVER_InferenceRequestDelete function actually performs the memory deallocation of the input? My guess is that it doesn't just looking here simply.cc since you guys define some custom memory deallocation. So my original question remains that when we get back control of the TRITONSERVER_InferenceRequest, I'd love to be able to use it to deallocate the memory allocated to the input (before we call TRITONSERVER_InferenceRequestDelete), but I can't do that since I don't know how to access the inputs on the TRITONSERVER_InferenceRequest object (i.e. we don't know what the members are). For reference I did take a look at the multi_server and GRPC server code, and its the same issue as far as I could tell.

Thanks!!

2 replies

dyastremsky Mar 14, 2023
Collaborator

You can see in the implementation file (tritonserver.cc) how the API is implemented. For example, TRITONSERVER_InferenceRequestDelete is here.

Triton uses a server-client architecture. The server would not need to explicitly deallocate the memory for inputs. You can see in the example clients (like the image client, starting here) that the client passes in the inputs they allocate and deallocate. As far as I know, the server core and backends would manage the memory associated with those inputs and outputs. We follow scope-bounded resource management (also known as "resource allocation is initialization"), so it wouldn't make sense for the server core to allocate memory for inputs then expect the user to deallocate them somewhere. Calling TRITONSERVER_InferenceRequestDelete and managing the input allocation/deallocation on the client side should be sufficient.

CC: @GuanLuo

avickars Mar 14, 2023
Author

got it! Thanks @dyastremsky, your help was very much appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to access inputs in TRITONSERVER_InferenceRequest #5499

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to access inputs in TRITONSERVER_InferenceRequest #5499

avickars Mar 13, 2023

Replies: 2 comments · 2 replies

dyastremsky Mar 13, 2023 Collaborator

avickars Mar 14, 2023 Author

dyastremsky Mar 14, 2023 Collaborator

avickars Mar 14, 2023 Author

avickars
Mar 13, 2023

Replies: 2 comments 2 replies

dyastremsky
Mar 13, 2023
Collaborator

avickars
Mar 14, 2023
Author

dyastremsky Mar 14, 2023
Collaborator

avickars Mar 14, 2023
Author