-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: huggingface/text-generation-inference
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
no prefill when decoder_input_details=True from InferenceClient
#2973
opened Jan 30, 2025 by
lifeng-jin
2 of 4 tasks
Incorrect Tokenization Likely Because of Diacritics in OpenChat and LLaMA 3.2 (TGI v3.0.2 and v2.2.0)
#2969
opened Jan 30, 2025 by
biba10
2 of 4 tasks
Structured output doesn't work with open ai endpoint
#2959
opened Jan 27, 2025 by
Stealthwriter
2 of 4 tasks
Running Qwen2-VL-2B-Instruct on TGI is giving an error
#2955
opened Jan 27, 2025 by
ashwani-bhat
2 of 4 tasks
CUDA Out of memory when using the benchmarking tool with batch size greater than 1
#2952
opened Jan 24, 2025 by
mborisov-bi
3 of 4 tasks
Serverless Inference API OpenAI /v1/chat/completions route broken
#2946
opened Jan 23, 2025 by
pelikhan
1 of 4 tasks
RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model
#2944
opened Jan 23, 2025 by
edesalve
2 of 4 tasks
text-generation-inference:3.0.1 docker container timeout on image fetching from fastapi static files.
#2930
opened Jan 21, 2025 by
dinoelT
2 of 4 tasks
Mangled generation for string sequences containing
<space>'m
with Llama 3.1
#2927
opened Jan 20, 2025 by
tomjorquera
1 of 4 tasks
AttributeError: no attribute 'model' when using llava-next with lora-adapters
#2926
opened Jan 20, 2025 by
derkleinejakob
2 of 4 tasks
Does tgi support image resize for qwen2-vl pipeline?
#2920
opened Jan 16, 2025 by
AHEADer
1 of 4 tasks
CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)
#2915
opened Jan 15, 2025 by
dwyatte
3 of 4 tasks
Slow when using response format with JSON schemas with 8+ optional properties
#2902
opened Jan 11, 2025 by
TwirreM
2 of 4 tasks
Support
reponse_format: {"type": "json_object"}
without any constrained schema
#2899
opened Jan 10, 2025 by
lhoestq
Automatic Calculation of Sequence Length in TGI v3 Leads to Unrealistic Values Before CUDA OOM
#2897
opened Jan 10, 2025 by
biba10
2 of 4 tasks
Prefill operation can be significantly slower in TGI v3 vs TGI v2
#2896
opened Jan 10, 2025 by
biba10
2 of 4 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.