Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text-generation-inference (TPU) container fixes #65

Closed
Michellehbn opened this issue Jun 27, 2024 · 4 comments
Closed

Text-generation-inference (TPU) container fixes #65

Michellehbn opened this issue Jun 27, 2024 · 4 comments
Assignees

Comments

@Michellehbn
Copy link
Member

As part of the support of TPU in Inference Endpoints and for a better user experience:

  • Resolve hanging on the server side
  • Very small generation length

cc @tengomucho @mfuntowicz

@tengomucho tengomucho self-assigned this Jun 27, 2024
@tengomucho
Copy link
Collaborator

tengomucho commented Jun 28, 2024

This can be separated in several smaller tasks. I'll list them here to follow up progress.

  • if "health" is called before any prefill call, it hangs.
  • "warmup" fails. This is because it tries to do prefill with a very long sequence (ignoring truncate info in the request), and apparently that ends up in a RESOURCE_EXHAUSTED error.
  • I believe "warmup" isn't really doing anything to make prefill/decode smoother afterwards in TPU, and it's just taking time compiling and making the inference, filling TPU memory uselessly. We might consider better handling warmup calls.
  • sequence length and parameters are low. We should investigate if we could increase this by bucketing or what is causing this issue.

I have now fixed the health issue. The problem was a wrong CachedBatch serialization. progrss is in branch debug-tgi-ie.

@tengomucho
Copy link
Collaborator

Daily update : warmup now works on the branch and the truncate works too. I am currently working on increasing the input length, trying to do that by bucketing prefilled inputs.

@tengomucho
Copy link
Collaborator

I have almost fixed everything, I do truncate as it should and I do bucketing and warmup. But I also introduced a bug, because I padded wrongly when bucketing prefills. I will fix that tomorrow.

@tengomucho
Copy link
Collaborator

Fixes are in #67 #68 and #69. Closing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants