Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

NEWbie0709 · 2025-01-20T06:44:02Z

Describe the bug

I encountered an issue when using the Hugging Face Inference API with the Qwen2-VL-7B-Instruct model. Despite providing valid input, the API returned an error indicating that the token count exceeds the limit.

{
    "error": {
        "message": "Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 8740 `inputs` tokens and 500 `max_new_tokens`",
        "http_status_code": 422
    }
}

Reproduction

Run the following curl command:

curl 'https://api-inference.huggingface.co/models/Qwen/Qwen2-VL-7B-Instruct/v1/chat/completions'
-H 'Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
-H 'Content-Type: application/json'
--data '{
"model": "Qwen/Qwen2-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
],
"max_tokens": 500,
"stream": false
}'

Logs

{"error":"Model Qwen/Qwen2-VL-2B-Instruct is currently loading","estimated_time":176.71885681152344}{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 8740 `inputs` tokens and 500 `max_new_tokens`","error_type":"validation"}

System info

- huggingface_hub version: 0.26.5
- Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
- Python version: 3.10.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/yeow/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.5.1
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.10.3
- aiohttp: 3.11.10
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/yeow/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/yeow/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/yeow/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /home/yeow/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

The text was updated successfully, but these errors were encountered:

hanouticelina · 2025-01-20T09:44:23Z

Hello @NEWbie0709,
This is probably the same issue mentioned in this one #2760 and this is definitely an issue on TGI side rather than huggingface_hub. I suggest checking this related issue in TGI : text-generation-inference#2923 as other users as well experience the same problem with images consuming more tokens than they should.

NEWbie0709 added the bug Something isn't working label Jan 20, 2025

hanouticelina closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

NEWbie0709 commented Jan 20, 2025

hanouticelina commented Jan 20, 2025

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

Comments

NEWbie0709 commented Jan 20, 2025

Describe the bug

Reproduction

Logs

System info

hanouticelina commented Jan 20, 2025