Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few more Inference Endpoints fixes #69

Merged
merged 8 commits into from
Jul 8, 2024
Merged

Few more Inference Endpoints fixes #69

merged 8 commits into from
Jul 8, 2024

Conversation

tengomucho
Copy link
Collaborator

What does this PR do?

  • Fix clear request with an ID (it was causing a crash on server).
  • Raise an error when there are too many requests (it should never happen, but it's good to handle that).
  • Add more prefill lengths to warmup. It will take longer, but it will end up in faster inference for shorter prompts, at least until we find a better fix for bucketing and padding not working as expected.
  • Image version set to 0.1.2 (ready for release).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tengomucho tengomucho marked this pull request as draft July 8, 2024 06:44
Compiled model results are not always very good. While this should be
better investigated later on, current solution is just to use the
non-compiled version. This results in some tests generating different
results, so expectations has been updated accordingly.
@tengomucho tengomucho marked this pull request as ready for review July 8, 2024 08:18
@tengomucho tengomucho merged commit 7cce24c into main Jul 8, 2024
4 checks passed
@tengomucho tengomucho deleted the debug-tgi-ie-pt3 branch July 8, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants