-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local TGI model gives http request error of a different model #2804
Comments
Hi @lifeng-jin, sorry for the inconvenience. This problem is due to the fact that from huggingface_hub import InferenceClient
client = InferenceClient(
model="http://localhost:8082/v1/",
)
output = client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True) Which will solve your issue. Currently |
Thanks @Wauplin . I tried this exact solution. |
Then it's because this route doesn't exist on TGI. Try it without the /v1 |
Thanks again @Wauplin , this is super helpful, and now I can get outputs from the model. However when I do
I get
The decoder input / prefill is not returned like what the example showed in the doc. Could you please help? |
This has something to do with TGI, not the InferenceClient. Better to open a separate issue in the TGI repo instead (i.e. I don't have the answer^^) |
Describe the bug
i have served a llama-3-8b-instruct model locally with TGI. It ran with no issues. I created a InferenceClient with the base_url and did a chat completion. It also ran smoothly. I then tried to use text_generation with the same client, and got this crazy error:
HfHubHTTPError: 401 Client Error: Unauthorized for url: https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407 (Request ID: ce_pSj)
I changed a few models, and the error remained the same. The client was fine with chat_completion, but not with text_completion.
Reproduction
Logs
System info
The text was updated successfully, but these errors were encountered: