Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Report histogram metrics to Triton metrics server #56

Conversation

yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Aug 7, 2024

Sample histogram output

# HELP vllm:time_to_first_token_seconds Histogram of time to first token in seconds.
# TYPE vllm:time_to_first_token_seconds histogram
vllm:time_to_first_token_seconds_count{model="vllm_opt",version="1"} 3
vllm:time_to_first_token_seconds_sum{model="vllm_opt",version="1"} 0.0002238750457763672
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.001"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.005"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.02"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.04"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.06"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.08"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.25"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="7.5"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="10"} 3
vllm:time_to_first_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 3
# HELP vllm:time_per_output_token_seconds Histogram of time per output token in seconds.
# TYPE vllm:time_per_output_token_seconds histogram
vllm:time_per_output_token_seconds_count{model="vllm_opt",version="1"} 45
vllm:time_per_output_token_seconds_sum{model="vllm_opt",version="1"} 0.002027750015258789
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.01"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.025"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.05"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.075"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.15"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.2"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.3"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.4"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="0.75"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="1"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="2.5"} 45
vllm:time_per_output_token_seconds_bucket{model="vllm_opt",version="1",le="+Inf"} 45

What does the PR do?

Support histogram metric type and add tests.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • feat

Related PRs:

triton-inference-server/python_backend#374
triton-inference-server/core#386
triton-inference-server/server#7525

Where should the reviewer start?

n/a

Test plan:

n/a

  • CI Pipeline ID:
    17487728

Caveats:

n/a

Background

Customer requested histogram metrics from vLLM.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

n/a

@yinggeh yinggeh added the enhancement New feature or request label Aug 7, 2024
@yinggeh yinggeh self-assigned this Aug 7, 2024
@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 4c74307 to 20acebe Compare August 7, 2024 05:20
@yinggeh yinggeh changed the title feat: Add histogram metric type feat: Collect histogram metrics from vLLM Aug 7, 2024
@oandreeva-nv
Copy link
Collaborator

Could you possibly target the initial metrics branch yinggeh-DLIS-7061-add-vllm-metrics? It would be easier to review this way

src/model.py Outdated Show resolved Hide resolved
self.histogram_time_to_first_token = (
self.histogram_time_to_first_token_family.Metric(
labels=labels,
buckets=[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buckets here are just example from vLLM repo metrics.py. I think we want to let user define the interval buckets. Also good for the unittest since data observed are pretty small when prompts are simply. What is the best practice to allow customizable buckets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oandreeva-nv Explanation to comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if we ship utils/metrics.py as a part of a supported backend, we need to ship a defined set of buckets anyways, at least as a default. Since we ship this as a python script, users can always adjust it on their side. Since these values correspond to vllm side of things, I think it worth adding a comment about it with a permalink, so that we could easily refer to the original source and adjust

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

class VLLMTritonMetricsTest(TestResultCollector):
def setUp(self):
self.triton_client = grpcclient.InferenceServerClient(url="localhost:8001")
self.tritonserver_ipaddr = os.environ.get("TRITONSERVER_IPADDR", "localhost")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need additional env var while you also have hard-coded localhost for inference client.

Copy link
Contributor

@rmccorm4 rmccorm4 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This env var is used to work for windows tests as well because localhost doesn't currently work on our windows tests (cc @fpetrini15) but agree the hard-coded cases should be swapped to use shared variable

ci/L0_backend_vllm/metrics_test/vllm_metrics_test.py Outdated Show resolved Hide resolved
src/model.py Outdated Show resolved Hide resolved
@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 8, 2024

Could you possibly target the initial metrics branch yinggeh-DLIS-7061-add-vllm-metrics? It would be easier to review this way

@oandreeva-nv Target?

@oandreeva-nv
Copy link
Collaborator

@yinggeh , i.e. instead of merging this branch into main, set the target merge branch as yinggeh-DLIS-7061-add-vllm-metrics , this PR significantly overlaps with gauge/counter metrics PR, so it would be easier to review

@yinggeh yinggeh changed the base branch from main to yinggeh-DLIS-7061-add-vllm-metrics August 8, 2024 00:26
@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 8, 2024

@yinggeh , i.e. instead of merging this branch into main, set the target merge branch as yinggeh-DLIS-7061-add-vllm-metrics , this PR significantly overlaps with gauge/counter metrics PR, so it would be easier to review

@oandreeva-nv Is it better now?

@oandreeva-nv
Copy link
Collaborator

@yinggeh I think you would need to resolve conflicts as well. Basically, it is still better to re-base this branch on top of yinggeh-DLIS-7061-add-vllm-metrics. If not now, then when yinggeh-DLIS-7061-add-vllm-metrics is merged to main, same conflicts will be present.

@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 8, 2024

@yinggeh I think you would need to resolve conflicts as well. Basically, it is still better to re-base this branch on top of yinggeh-DLIS-7061-add-vllm-metrics. If not now, then when yinggeh-DLIS-7061-add-vllm-metrics is merged to main, same conflicts will be present.

@oandreeva-nv Do I need to resolve conflicts for yinggeh-DLIS-7113-support-histogram-metric-type each time yinggeh-DLIS-7061-add-vllm-metrics has a new commit?

@oandreeva-nv
Copy link
Collaborator

Depending on the commit.

@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 20acebe to 4b91f8c Compare August 8, 2024 23:43
@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 8, 2024

Depending on the commit.

Rebased.

@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from 4b91f8c to 2810d3f Compare August 12, 2024 18:04
@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from b24fef2 to 20184c3 Compare August 14, 2024 18:00
@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch 2 times, most recently from 993f0c7 to ce0cf0f Compare August 15, 2024 09:07
@yinggeh yinggeh force-pushed the yinggeh-DLIS-7113-support-histogram-metric-type branch from ce0cf0f to 9534298 Compare August 15, 2024 21:11
@yinggeh yinggeh changed the base branch from yinggeh-DLIS-7061-add-vllm-metrics to main August 16, 2024 10:28
@yinggeh yinggeh changed the base branch from main to yinggeh-DLIS-7061-add-vllm-metrics August 16, 2024 10:31
README.md Outdated
Comment on lines 253 to 262
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.05"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.075"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.1"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.15"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.2"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.3"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.4"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.5"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.75"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="1"} 15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.05"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.075"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.1"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.15"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.2"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.3"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.4"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.5"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="0.75"} 15
vllm:time_per_output_token_seconds_bucket{model="vllm_model",version="1",le="1"} 15
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -125,6 +125,15 @@
# vllm:generation_tokens_total
self.assertEqual(metrics_dict["vllm:generation_tokens_total"], 48)

# vllm:time_to_first_token_seconds
self.assertEqual(metrics_dict["vllm:time_to_first_token_seconds_count"], 3)
self.assertTrue(metrics_dict["vllm:time_to_first_token_seconds_sum"] > 0)

Check notice

Code scanning / CodeQL

Imprecise assert Note

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
self.assertEqual(metrics_dict["vllm:time_to_first_token_seconds_bucket"], 3)
# vllm:time_per_output_token_seconds
self.assertEqual(metrics_dict["vllm:time_per_output_token_seconds_count"], 45)
self.assertTrue(metrics_dict["vllm:time_per_output_token_seconds_sum"] > 0)

Check notice

Code scanning / CodeQL

Imprecise assert Note

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
@yinggeh yinggeh changed the title feat: Collect histogram metrics from vLLM feat: Report histogram metrics to Triton metrics server Aug 16, 2024
@yinggeh yinggeh merged commit f15658e into yinggeh-DLIS-7061-add-vllm-metrics Aug 16, 2024
3 checks passed
@rmccorm4
Copy link
Contributor

Note: new commits should dismiss approvals, but it looks like that setting wasn't applied in this repo -- will update the settings, but please keep that in mind 🙏

@yinggeh
Copy link
Contributor Author

yinggeh commented Aug 16, 2024

@rmccorm4 Thanks. Btw I merged to the wrong branch... It was initially set to merge to yinggeh-DLIS-7061-add-vllm-metrics to help reviewers skipped duplicate changes. I am working on new PRs to merge them into main.

@fpetrini15
Copy link

@yinggeh @rmccorm4 Repo rules have been updated to dismiss stale approvals when new commits are pushed. In the future, please do let the operations team know if the repo rules appear to be incorrect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants