-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text-generation-inference (TPU) container fixes #65
Comments
This can be separated in several smaller tasks. I'll list them here to follow up progress.
I have now fixed the health issue. The problem was a wrong CachedBatch serialization. progrss is in branch debug-tgi-ie. |
Daily update : warmup now works on the branch and the truncate works too. I am currently working on increasing the input length, trying to do that by bucketing prefilled inputs. |
I have almost fixed everything, I do truncate as it should and I do bucketing and warmup. But I also introduced a bug, because I padded wrongly when bucketing prefills. I will fix that tomorrow. |
As part of the support of TPU in Inference Endpoints and for a better user experience:
cc @tengomucho @mfuntowicz
The text was updated successfully, but these errors were encountered: