feat: Report histogram metrics to Triton metrics server #56

yinggeh · 2024-08-07T05:14:32Z

Sample histogram output

# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_count{model="vllm_opt",version="1"} 3
vllm:time_to_first_token_seconds_sum{model="vllm_opt",version="1"} 0.0002238750457763672
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.001"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.005"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.02"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.04"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.06"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.08"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.25"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="7.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="10"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 3
# HELP vllm:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE vllm:time_per_output_token_seconds histogram
vllm:time_per_output_token_seconds_count{model="vllm_opt",version="1"} 45
vllm:time_per_output_token_seconds_sum{model="vllm_opt",version="1"} 0.002027750015258789
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.025"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.05"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.075"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.15"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.2"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.3"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.4"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 45

What does the PR do?

Support histogram metric type and add tests.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

triton-inference-server/python_backend#374
triton-inference-server/core#386
triton-inference-server/server#7525

Where should the reviewer start?

n/a

Test plan:

n/a

CI Pipeline ID:
17487728

Caveats:

n/a

Background

Customer requested histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

oandreeva-nv · 2024-08-07T19:05:50Z

Could you possibly target the initial metrics branch yinggeh-DLIS-7061-add-vllm-metrics? It would be easier to review this way

src/model.py

src/utils/metrics.py

yinggeh · 2024-08-07T18:02:07Z

src/utils/metrics.py

+        self.histogram_time_to_first_token = (
+            self.histogram_time_to_first_token_family.Metric(
+                labels=labels,
+                buckets=[


Buckets here are just example from vLLM repo metrics.py. I think we want to let user define the interval buckets. Also good for the unittest since data observed are pretty small when prompts are simply. What is the best practice to allow customizable buckets?

@oandreeva-nv Explanation to comment.

I think, if we ship utils/metrics.py as a part of a supported backend, we need to ship a defined set of buckets anyways, at least as a default. Since we ship this as a python script, users can always adjust it on their side. Since these values correspond to vllm side of things, I think it worth adding a comment about it with a permalink, so that we could easily refer to the original source and adjust

GuanLuo · 2024-08-07T20:58:12Z

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

+class VLLMTritonMetricsTest(TestResultCollector):
+    def setUp(self):
+        self.triton_client = grpcclient.InferenceServerClient(url="localhost:8001")
+        self.tritonserver_ipaddr = os.environ.get("TRITONSERVER_IPADDR", "localhost")


Why need additional env var while you also have hard-coded localhost for inference client.

This env var is used to work for windows tests as well because localhost doesn't currently work on our windows tests (cc @fpetrini15) but agree the hard-coded cases should be swapped to use shared variable

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

src/model.py

yinggeh · 2024-08-08T00:14:42Z

Could you possibly target the initial metrics branch yinggeh-DLIS-7061-add-vllm-metrics? It would be easier to review this way

@oandreeva-nv Target?

oandreeva-nv · 2024-08-08T00:24:32Z

@yinggeh , i.e. instead of merging this branch into main, set the target merge branch as yinggeh-DLIS-7061-add-vllm-metrics , this PR significantly overlaps with gauge/counter metrics PR, so it would be easier to review

yinggeh · 2024-08-08T00:29:28Z

@yinggeh , i.e. instead of merging this branch into main, set the target merge branch as yinggeh-DLIS-7061-add-vllm-metrics , this PR significantly overlaps with gauge/counter metrics PR, so it would be easier to review

@oandreeva-nv Is it better now?

oandreeva-nv · 2024-08-08T00:43:12Z

@yinggeh I think you would need to resolve conflicts as well. Basically, it is still better to re-base this branch on top of yinggeh-DLIS-7061-add-vllm-metrics. If not now, then when yinggeh-DLIS-7061-add-vllm-metrics is merged to main, same conflicts will be present.

yinggeh · 2024-08-08T01:25:40Z

@yinggeh I think you would need to resolve conflicts as well. Basically, it is still better to re-base this branch on top of yinggeh-DLIS-7061-add-vllm-metrics. If not now, then when yinggeh-DLIS-7061-add-vllm-metrics is merged to main, same conflicts will be present.

@oandreeva-nv Do I need to resolve conflicts for yinggeh-DLIS-7113-support-histogram-metric-type each time yinggeh-DLIS-7061-add-vllm-metrics has a new commit?

oandreeva-nv · 2024-08-08T17:10:51Z

Depending on the commit.

yinggeh · 2024-08-08T23:45:22Z

Depending on the commit.

Rebased.

GuanLuo · 2024-08-16T17:24:55Z

README.md

+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.05"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.075"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.1"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.15"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.2"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.3"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.4"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.5"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.75"} 15
+vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="1"} 15


Suggested change

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.05"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.075"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.1"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.15"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.2"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.3"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.4"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.5"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.75"} 15

vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="1"} 15

...

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

@@ -125,6 +125,15 @@
        # vllm:generation_tokens_total
        self.assertEqual(metrics_dict["vllm:generation_tokens_total"], 48)

+        # vllm:time_to_first_token_seconds
+        self.assertEqual(metrics_dict["vllm:time_to_first_token_seconds_count"], 3)
+        self.assertTrue(metrics_dict["vllm:time_to_first_token_seconds_sum"] > 0)


ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py

+        self.assertEqual(metrics_dict["vllm:time_to_first_token_seconds_bucket"], 3)
+        # vllm:time_per_output_token_seconds
+        self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_count"], 45)
+        self.assertTrue(metrics_dict["vllm:time_per_output_token_seconds_sum"] > 0)


rmccorm4 · 2024-08-16T21:36:26Z

Note: new commits should dismiss approvals, but it looks like that setting wasn't applied in this repo -- will update the settings, but please keep that in mind 🙏

yinggeh · 2024-08-16T21:39:09Z

@rmccorm4 Thanks. Btw I merged to the wrong branch... It was initially set to merge to yinggeh-DLIS-7061-add-vllm-metrics to help reviewers skipped duplicate changes. I am working on new PRs to merge them into main.

fpetrini15 · 2024-08-16T21:41:56Z

@yinggeh @rmccorm4 Repo rules have been updated to dismiss stale approvals when new commits are pushed. In the future, please do let the operations team know if the repo rules appear to be incorrect.

yinggeh added the enhancement New feature or request label Aug 7, 2024

yinggeh requested review from kthui, rmccorm4, GuanLuo, krishung5 and oandreeva-nv August 7, 2024 05:14

yinggeh self-assigned this Aug 7, 2024

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 4c74307 to 20acebe Compare August 7, 2024 05:20

yinggeh changed the title ~~feat: Add histogram metric type~~ feat: Collect histogram metrics from vLLM Aug 7, 2024

This was referenced Aug 7, 2024

feat: Add histogram metric type triton-inference-server/core#386

Merged

feat: Add histogram metric type triton-inference-server/python_backend#374

Merged

oandreeva-nv reviewed Aug 7, 2024

View reviewed changes

src/model.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed Aug 7, 2024

View reviewed changes

src/utils/metrics.py Show resolved Hide resolved

yinggeh commented Aug 7, 2024

View reviewed changes

GuanLuo reviewed Aug 7, 2024

View reviewed changes

yinggeh changed the base branch from main to yinggeh-DLIS-7061-add-vllm-metrics August 8, 2024 00:26

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 20acebe to 4b91f8c Compare August 8, 2024 23:43

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 4b91f8c to 2810d3f Compare August 12, 2024 18:04

yinggeh mentioned this pull request Aug 14, 2024

test: Test histogram metric triton-inference-server/server#7525

Merged

11 tasks

yinggeh requested review from GuanLuo and oandreeva-nv August 14, 2024 01:14

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from b24fef2 to 20184c3 Compare August 14, 2024 18:00

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch 2 times, most recently from 993f0c7 to ce0cf0f Compare August 15, 2024 09:07

yinggeh added 2 commits August 15, 2024 14:11

Add histogram test

de7ff8f

Longer time for A100

9534298

yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from ce0cf0f to 9534298 Compare August 15, 2024 21:11

yinggeh added 2 commits August 15, 2024 14:13

Update comment

38ac8d6

Add histogram metrics to doc

ebdf14e

yinggeh changed the base branch from yinggeh-DLIS-7061-add-vllm-metrics to main August 16, 2024 10:28

yinggeh changed the base branch from main to yinggeh-DLIS-7061-add-vllm-metrics August 16, 2024 10:31

GuanLuo approved these changes Aug 16, 2024

View reviewed changes

yinggeh added 2 commits August 16, 2024 11:33

Update docs

0d67322

Make metrics test more robust

10d8a69

github-advanced-security bot found potential problems Aug 16, 2024

View reviewed changes

yinggeh changed the title ~~feat: Collect histogram metrics from vLLM~~ feat: Report histogram metrics to Triton metrics server Aug 16, 2024

yinggeh merged commit f15658e into yinggeh-DLIS-7061-add-vllm-metrics Aug 16, 2024
3 checks passed

yinggeh mentioned this pull request Aug 16, 2024

test: Add python backend tests for the new histogram metric triton-inference-server/server#7540

Merged

11 tasks

yinggeh mentioned this pull request Aug 16, 2024

feat: Report histogram metrics to Triton metrics server #58

Merged

11 tasks

feat: Report histogram metrics to Triton metrics server #56

feat: Report histogram metrics to Triton metrics server #56

Uh oh!

Conversation

yinggeh commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

oandreeva-nv commented Aug 7, 2024

Uh oh!

Uh oh!

Uh oh!

yinggeh Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

yinggeh Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

oandreeva-nv Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

yinggeh Aug 15, 2024

Choose a reason for hiding this comment

Uh oh!

GuanLuo Aug 7, 2024

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yinggeh commented Aug 8, 2024

Uh oh!

oandreeva-nv commented Aug 8, 2024

Uh oh!

yinggeh commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oandreeva-nv commented Aug 8, 2024

Uh oh!

yinggeh commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oandreeva-nv commented Aug 8, 2024

Uh oh!

yinggeh commented Aug 8, 2024

Uh oh!

GuanLuo Aug 16, 2024

Choose a reason for hiding this comment

Uh oh!

yinggeh Aug 16, 2024

Choose a reason for hiding this comment

Uh oh!

Check notice

Check notice

Uh oh!

rmccorm4 commented Aug 16, 2024

Uh oh!

yinggeh commented Aug 16, 2024

Uh oh!

fpetrini15 commented Aug 16, 2024

Uh oh!

Uh oh!

yinggeh commented Aug 7, 2024 •

edited

Loading

rmccorm4 Aug 7, 2024 •

edited

Loading

yinggeh commented Aug 8, 2024 •

edited

Loading

yinggeh commented Aug 8, 2024 •

edited

Loading