Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error following the huggingface hosting instructions. #8

Open
derekalia opened this issue Jan 22, 2025 · 6 comments
Open

Error following the huggingface hosting instructions. #8

derekalia opened this issue Jan 22, 2025 · 6 comments

Comments

@derekalia
Copy link

Followed the provided notion doc for deploying on HF -> https://juniper-switch-f10.notion.site/GUI-Model-Deployment-Guide-17b5350241e280058e98cea60317de71

I used the 72B-DPO model for deployment -> https://huggingface.co/bytedance-research/UI-TARS-72B-DPO

Added all the correct configurations but always got back an error:

raise self._make_status_error_from_response(err.response) from None
openai.UnprocessableEntityError: Error code: 422 - {'error': 'Input validation error: inputs tokens + max_new_tokens must be <= 32768. Given: 880370 inputs tokens and 0 max_new_tokens', 'error_type': 'validation'}

Basically the image size is too large and I need to be under the 32k limit.

I also used the resize_image function to reduce the image size but this didnt really effect my image but due to its already small size (1200px by 800px)

I also tried using one of the provided images in the hosted space on HF -> https://huggingface.co/spaces/Aheader/gui_test_app/blob/main/examples/solitaire.png

This is even larger than my image and it didnt work.

Any ideas why its not working with your provided setup?

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 23, 2025

you may have too many images in your prompt. I suggest just keep at most 5 images with all the text as the input

@derekalia
Copy link
Author

derekalia commented Jan 23, 2025

I'm sending 1 image.

Also I just tired to deploy the 7b model and that worked. Looks like there is an issue with the 72B model hosted on HF.

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 23, 2025

can you paste your request script here so i can try to reproduce and fix

@derekalia
Copy link
Author

derekalia commented Jan 23, 2025

I'm just running the code that was provided in the example doc.

Also can I ask what prompt are you giving your LLM? Your space gets good results and everytime I run locally or on HF the results are not that good.

from openai import OpenAI


instruction = "find the search bar"
screenshot_path = "imgs/shoot.png"


client = OpenAI(base_url="my hf enpoind here", api_key="my key gere")
model = "tgi"

prompt = "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nYou are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. \n\n    ## Output Format\n    ```\n    Action_Summary: ...\n    Action: ...\n    ```\n\n    ## Action Space\n    click(start_box=‘<|box_start|>(x1,y1)<|box_end|>’)\nlong_press(start_box=‘<|box_start|>(x1,y1)<|box_end|>’, time=‘’)\ntype(content=‘’)\nscroll(direction=‘down or up or right or left’)\nopen_app(app_name=‘’)\nnavigate_back()\nnavigate_home()\nWAIT()\nfinished() # Submit the task regardless of whether it succeeds or fails.\n\n    ## Note\n    - Use English in `Action_Summary` part.\n    \n\n    ## User Instruction\n"

with open(screenshot_path, "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt + instruction},
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{encoded_string}"}},

            ],
        },
    ],
)
print(response.choices[0].message.content)

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 23, 2025

Hi,
My space prompt is Output only the coordinate of one box in your response. and it's just for grounding. I think maybe you can remove this part <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n from your prompt, since this part will auto generated by the chat_template for the cloud platform. I've updated our README.
By the way, I cannot find any quota from huggingface cloud to deploy a 72B model, I'll get back to you soon once I have quota.

@AHEADer
Copy link
Collaborator

AHEADer commented Jan 24, 2025

Hi,

We have solved this problem by updating the model's config. We use the config from old version of transformers. It seems that TGI and Inference Endpoints have some bugs reading the config from the newest transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants