Description
Hello,
I am seeking advice on the best practices for tracking all inputs and predictions made by a model when using Triton Inference Server. Specifically, I would like to track every interaction the model handles, including input data and the corresponding predictions.
I have reviewed the documentation about Triton Server Trace, but it is unclear if this feature can track predictions as well. You can find the documentation here: Triton Server Trace Documentation.
Additionally, I am concerned about the impact of tracking on system latency. While I am aware that solutions for traditional ML platforms (such as Seldon-Core) often use technologies like KNative and Kafka to store tracking information, it is not clear how these approaches can be integrated with Triton without compromising performance.
I would appreciate recommendations on:
- How to effectively track inputs and predictions in Triton Inference Server.
- Whether Triton Server Trace can be utilized for this purpose, and if so, how.
- Alternative methods or best practices for tracking interactions in Triton while maintaining low latency.
Thank you for your assistance.