Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

Open
averypfeiffer opened this issue Jul 17, 2024 · 5 comments

Comments

@averypfeiffer
Copy link

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:

"ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

@Lyken17
Copy link
Collaborator

Lyken17 commented Jul 17, 2024

unfortunatley we do not have sagemaker experts in our team. Could you check with AWS team for more details? Or share a scripts that can reproduce the error locally?

@averypfeiffer
Copy link
Author

averypfeiffer commented Jul 18, 2024

Absolutely! I don't believe its a sagemaker issue, it seems like a lack of support for the custom config llava_llama in the transformers library.

Here is a simple script that will immediately produce the issue when trying to load the model via the hugging face transformers library:

from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline(
    "visual-question-answering", model="Efficient-Large-Model/VILA1.5-40b"
)


# load an example image
image = Image.open("./test_images/einsidtoJYc-Scene-6-01.jpg")

# example text input
text = "What is happening in this image?"

# Prepare the payload
payload = {
    "inputs": {
        "question": text,
        "image": image
    }
}

result = vqa_pipeline(image, text, top_k=1)

print(f"Question: {text}")
print(f"Answer: {result[0]['answer']}")

@Lyken17
Copy link
Collaborator

Lyken17 commented Aug 1, 2024

i think the problem is that we haven't tested with vqa-pipeline yet. Could you check with our offical inference impl?

@JBurtn
Copy link

JBurtn commented Aug 3, 2024

Even simpler example.

from transformers import AutoConfig

model_id = "Efficient-Large-Model/VILA1.5-40b"
config = AutoConfig.from_pretrained(model_id,  trust_remote_code=True)# Error Here
print(config)

@JBurtn
Copy link

JBurtn commented Aug 4, 2024

I copied what I needed from run_vila.py and it worked.
if you do

from VILA.llava.model import *

it should fix the llava_llama issue.
It still complains about missing weights (even with use_safetensors=False) if you try AWQ versions though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants