Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input validation error: inputs tokens + max_new_tokens must be <= 4096 in Qwen2-VL-7B-Instruct #2763

Closed
NEWbie0709 opened this issue Jan 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@NEWbie0709
Copy link

Describe the bug

I encountered an issue when using the Hugging Face Inference API with the Qwen2-VL-7B-Instruct model. Despite providing valid input, the API returned an error indicating that the token count exceeds the limit.

{
    "error": {
        "message": "Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 8740 `inputs` tokens and 500 `max_new_tokens`",
        "http_status_code": 422
    }
}

Reproduction

Run the following curl command:

curl 'https://api-inference.huggingface.co/models/Qwen/Qwen2-VL-7B-Instruct/v1/chat/completions'
-H 'Authorization: Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
-H 'Content-Type: application/json'
--data '{
"model": "Qwen/Qwen2-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
],
"max_tokens": 500,
"stream": false
}'

Logs

{"error":"Model Qwen/Qwen2-VL-2B-Instruct is currently loading","estimated_time":176.71885681152344}{"error":"Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 8740 `inputs` tokens and 500 `max_new_tokens`","error_type":"validation"}

System info

- huggingface_hub version: 0.26.5
- Platform: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
- Python version: 3.10.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/yeow/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.5.1
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.4.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.10.3
- aiohttp: 3.11.10
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/yeow/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/yeow/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/yeow/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /home/yeow/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
@NEWbie0709 NEWbie0709 added the bug Something isn't working label Jan 20, 2025
@hanouticelina
Copy link
Contributor

Hello @NEWbie0709,
This is probably the same issue mentioned in this one #2760 and this is definitely an issue on TGI side rather than huggingface_hub. I suggest checking this related issue in TGI : text-generation-inference#2923 as other users as well experience the same problem with images consuming more tokens than they should.

@hanouticelina hanouticelina closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants