You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Okay, this is very ambitious, but if there was a way to estimate the tokens per second that would be pretty cool. Like, when searching in hugging face each search result has a token per second estimate.
This will probably require a statistical model or even a NN. It would take the phones specs into account.
The text was updated successfully, but these errors were encountered:
Yeah I'm thinking you have a small model (basically just a statistical model) taking phone specs and LLM specs as input and outputs a normal distribution (i.e. mean and variance) and you optimize the cross entropy.
Description
Okay, this is very ambitious, but if there was a way to estimate the tokens per second that would be pretty cool. Like, when searching in hugging face each search result has a token per second estimate.
This will probably require a statistical model or even a NN. It would take the phones specs into account.
The text was updated successfully, but these errors were encountered: