You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some grafana graphs using Triton's prometheus metrics, and it appears that in a semi-recent update that nv_inference_count no longer includes a gpu_uuid field (I see only "model" and "version"). I have a graph showing the number of inferences per gpu, which no longer works.
The text was updated successfully, but these errors were encountered:
Hi @chriscarollo, have you used the tritonserver --model-control-mode EXPLICIT ... (or POLL) feature to dynamically load/unload models before? I believe there may be a known inconsistency where models loaded at startup have no GPU_ID label for non-GPU metrics, and models dynamically loaded later on after server has started do have these GPU_ID labels applied to other non-GPU related metrics.
Please let me know if you can consistently identify or reproduce this behavior one way or the other.
I'm actually using model-control-mode POLL and it does appear that my gpu_id labels did come back after it detected new versions. So it does look like maybe only an issue on initial startup?
Hi @chriscarollo, this is a known issue and has a proposed resolution in this PR: triton-inference-server/core#321. Please chime in on the discussion with your use case, impact, etc.
Hi, this bug is affecting us. We recently switched from poll mode explicit to poll mode none, and unfortunately, this change broke our Grafana dashboards 😞
Any estimation on a timeline for a fix?
I have some grafana graphs using Triton's prometheus metrics, and it appears that in a semi-recent update that nv_inference_count no longer includes a gpu_uuid field (I see only "model" and "version"). I have a graph showing the number of inferences per gpu, which no longer works.
The text was updated successfully, but these errors were encountered: