You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now collecting these measurements for Groq. May be able to infer them for other providers based on hardware batch size limits (eg the suggestion above). H100 should yield 23 ktoken/sec on llama-3-8b/h100 at bs 1, 10 ktoken/sec on a100.
already have the data to compute this. input TPS should be (96 * output TPS), I think,
The text was updated successfully, but these errors were encountered: