huggingface / text-generation-inference Public

Notifications You must be signed in to change notification settings
Fork 1.1k
Star 9.7k

Code
Issues 192
Pull requests 16
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: huggingface/text-generation-inference

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

192 Open 1,245 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Google cloud Vertex AI integration issue (/generate)

#2974 opened Jan 31, 2025 by jk1333

no prefill when decoder_input_details=True from InferenceClient

#2973 opened Jan 30, 2025 by lifeng-jin

2 of 4 tasks

Incorrect Tokenization Likely Because of Diacritics in OpenChat and LLaMA 3.2 (TGI v3.0.2 and v2.2.0)

#2969 opened Jan 30, 2025 by biba10

2 of 4 tasks

Structured output doesn't work with open ai endpoint

#2959 opened Jan 27, 2025 by Stealthwriter

2 of 4 tasks

How to give custom model code for TGI to run.

#2956 opened Jan 27, 2025 by ashwani-bhat

Running Qwen2-VL-2B-Instruct on TGI is giving an error

#2955 opened Jan 27, 2025 by ashwani-bhat

2 of 4 tasks

CUDA Out of memory when using the benchmarking tool with batch size greater than 1

#2952 opened Jan 24, 2025 by mborisov-bi

3 of 4 tasks

Serverless Inference API OpenAI /v1/chat/completions route broken

#2946 opened Jan 23, 2025 by pelikhan

1 of 4 tasks

RuntimeError: Cannot load 'awq' weight when running Qwen2-VL-72B-Instruct-AWQ model

#2944 opened Jan 23, 2025 by edesalve

2 of 4 tasks

Allow specifying adapter_id on chat/completions requests

#2939 opened Jan 22, 2025 by tsvisab

Remove Conda from Docker Installation

#2934 opened Jan 21, 2025 by rjmehta1993

2 of 4 tasks

text-generation-inference:3.0.1 docker container timeout on image fetching from fastapi static files.

#2930 opened Jan 21, 2025 by dinoelT

2 of 4 tasks

Mangled generation for string sequences containing<space>'m with Llama 3.1

#2927 opened Jan 20, 2025 by tomjorquera

1 of 4 tasks

AttributeError: no attribute 'model' when using llava-next with lora-adapters

#2926 opened Jan 20, 2025 by derkleinejakob

2 of 4 tasks

Image eats up way too many tokens

#2923 opened Jan 17, 2025 by aymeric-roucher

2 of 4 tasks

Does tgi support image resize for qwen2-vl pipeline?

#2920 opened Jan 16, 2025 by AHEADer

1 of 4 tasks

CUDA: an illegal memory access was encountered with Mistral FP8 Marlin kernels on NVIDIA driver 535.216.01 (AWS Sagemaker Real-time Inference)

#2915 opened Jan 15, 2025 by dwyatte

3 of 4 tasks

chat API doesn't support/respect n parameter

#2914 opened Jan 15, 2025 by zfang

2 of 4 tasks

Apply rope scaling from the config.json

#2909 opened Jan 15, 2025 by rjmehta1993

Support for openbmb/MiniCPM-o-2_6

#2906 opened Jan 15, 2025 by myoss

1 of 2 tasks

Slow when using response format with JSON schemas with 8+ optional properties

#2902 opened Jan 11, 2025 by TwirreM

2 of 4 tasks

Support XGrammar backend as an alternative to Outlines

#2900 opened Jan 10, 2025 by 2016bgeyer

Support reponse_format: {"type": "json_object"} without any constrained schema

#2899 opened Jan 10, 2025 by lhoestq

Automatic Calculation of Sequence Length in TGI v3 Leads to Unrealistic Values Before CUDA OOM

#2897 opened Jan 10, 2025 by biba10

2 of 4 tasks

Prefill operation can be significantly slower in TGI v3 vs TGI v2

#2896 opened Jan 10, 2025 by biba10

2 of 4 tasks

Previous 1 2 3 4 5 6 7 8 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly