Add input vs output token latency #17

juberti · 2024-05-02T20:53:07Z

already have the data to compute this. input TPS should be (96 * output TPS), I think,

juberti · 2024-07-29T19:03:17Z

Now collecting these measurements for Groq. May be able to infer them for other providers based on hardware batch size limits (eg the suggestion above). H100 should yield 23 ktoken/sec on llama-3-8b/h100 at bs 1, 10 ktoken/sec on a100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add input vs output token latency #17

Add input vs output token latency #17

juberti commented May 2, 2024

juberti commented Jul 29, 2024

Add input vs output token latency #17

Add input vs output token latency #17

Comments

juberti commented May 2, 2024

juberti commented Jul 29, 2024