-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add first two LLM test guides #396
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Only thing is that it is not clear that 2 different experiments are described here. Can you add 2 headers that describe the 1st token latency and T2T latency?
7d979cf
to
b44ab4e
Compare
src/c++/perf_analyzer/docs/examples/calculate_avg_first_token_latency.py
Fixed
Show resolved
Hide resolved
src/c++/perf_analyzer/docs/examples/calculate_avg_token_to_token_latency.py
Fixed
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, otherwise looks good!
src/c++/perf_analyzer/docs/examples/calculate_avg_first_token_latency.py
Show resolved
Hide resolved
src/c++/perf_analyzer/docs/examples/calculate_avg_token_to_token_latency.py
Show resolved
Hide resolved
f84cc02
to
4281537
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Comment to make sure triton-inference-server/tutorials#46 is taken into account for how this guide works. |
Add guides for prefill and generation steps of LLM.