Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

averypfeiffer · 2024-07-17T15:22:41Z

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:

"ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

The text was updated successfully, but these errors were encountered:

Lyken17 · 2024-07-17T23:51:53Z

unfortunatley we do not have sagemaker experts in our team. Could you check with AWS team for more details? Or share a scripts that can reproduce the error locally?

averypfeiffer · 2024-07-18T16:38:51Z

Absolutely! I don't believe its a sagemaker issue, it seems like a lack of support for the custom config llava_llama in the transformers library.

Here is a simple script that will immediately produce the issue when trying to load the model via the hugging face transformers library:

from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline(
    "visual-question-answering", model="Efficient-Large-Model/VILA1.5-40b"
)


# load an example image
image = Image.open("./test_images/einsidtoJYc-Scene-6-01.jpg")

# example text input
text = "What is happening in this image?"

# Prepare the payload
payload = {
    "inputs": {
        "question": text,
        "image": image
    }
}

result = vqa_pipeline(image, text, top_k=1)

print(f"Question: {text}")
print(f"Answer: {result[0]['answer']}")

Lyken17 · 2024-08-01T04:07:32Z

i think the problem is that we haven't tested with vqa-pipeline yet. Could you check with our offical inference impl?

JBurtn · 2024-08-03T00:37:01Z

Even simpler example.

from transformers import AutoConfig

model_id = "Efficient-Large-Model/VILA1.5-40b"
config = AutoConfig.from_pretrained(model_id,  trust_remote_code=True)# Error Here
print(config)

JBurtn · 2024-08-04T00:45:08Z

I copied what I needed from run_vila.py and it worked.
if you do

from VILA.llava.model import *

it should fix the llava_llama issue.
It still complains about missing weights (even with use_safetensors=False) if you try AWQ versions though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

averypfeiffer commented Jul 17, 2024

Lyken17 commented Jul 17, 2024

averypfeiffer commented Jul 18, 2024 •

edited

Loading

Lyken17 commented Aug 1, 2024

JBurtn commented Aug 3, 2024 •

edited

Loading

JBurtn commented Aug 4, 2024

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error #94

Comments

averypfeiffer commented Jul 17, 2024

Lyken17 commented Jul 17, 2024

averypfeiffer commented Jul 18, 2024 • edited Loading

Lyken17 commented Aug 1, 2024

JBurtn commented Aug 3, 2024 • edited Loading

JBurtn commented Aug 4, 2024

averypfeiffer commented Jul 18, 2024 •

edited

Loading

JBurtn commented Aug 3, 2024 •

edited

Loading