[Question] Best practises to track inputs and predictions? 

Hello,

I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction the model handles, including input data and the corresponding predictions.

I have reviewed the documentation about Triton Server Trace, but it is unclear if this feature can track predictions as well. You can find the documentation here: [Triton Server Trace Documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/trace.html).

Additionally, I am concerned about the impact of tracking on system latency. While I am aware that solutions for traditional ML platforms (such as Seldon-Core) often use technologies like KNative and Kafka to store tracking information, it is not clear how these approaches can be integrated with Triton without compromising performance.

I would appreciate recommendations on:

* How to effectively track inputs and predictions in Triton Inference Server.
* Whether Triton Server Trace can be utilized for this purpose, and if so, how.
* Alternative methods or best practices for tracking interactions in Triton while maintaining low latency.

Thank you for your assistance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Best practises to track inputs and predictions? #475

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Best practises to track inputs and predictions? #475

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions