Use Llama3.2 Image model to only test text input and do not repeat questions #361

Fujiaoji · 2024-11-16T04:56:23Z

Describe the bug

Hi, I am using the Llama-3.2-11B-Vision-Instruct to text the text input. I have two questions:

I am not sure if it is correct to directly use the image model to test text in this way, so I wanna double check.
The output always repeat the input_text_info for the response, I wanna the model directly give me the answers rather than repeat the info in the response.

My goal is that ask the model some questions by prompts, based on my descriptions (not the prompt), then the model give me the answer. Do not repeat the questions again. For example, the prompt is like this

suppose you are an expert of history, please answer the following questions based on the given text
1. state: <response>
4. Confidence_Score: <How confident are you when identifying the brand on a scale of 1 to 5, 5 being absolutely confident, 1 being not confident>
5. Supporting_Evidence:

The context is like this and very long:

Knoxville is a city of Tennessee, ......, and so on.

The example code looks like this, I set the image=None, the input_text_info and the former prompt + the context. Is this correct?:

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path=model_id,
    torch_dtype=torch.bfloat16,
    device_map=f"cuda:{gpu_id}",
    cache_dir="XXX/Llama/cache/"
)
processor = AutoProcessor.from_pretrained(model_id, cache_dir="XX/Llama/cache/")

messages = [{"role": "user", "content": [{"type": "text", "text": input_text_info}]}, {"role": "assistant", "content": ""}]
input_text = processor.apply_chat_template(messages, format = "json", add_generation_prompt=True, return_full_text=False)
inputs = processor(images=None, text=input_text, add_special_tokens=False, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=100)
answer = processor.decode(output[0]) # skip_special_tokens=True, remove_input=True
responce_info.append(answer)

The output of the code looks like this

{
       "<|start_header_id|>user<|end_header_id|>\n\nsuppose you are an expert of history, please answer the following questions based on the given text
1. state: <response>
2. Confidence_Score: <How confident are you when identifying the brand on a scale of 1 to 5, 5 being absolutely confident, 1 being not confident>
3. Supporting_Evidence:
Knoxville is a city of Tennessee, ......, and so on.
<|start_header_id|>assistant<|end_header_id|
1. state: Tennessee
2. Confidence_Score: 5
3. Supporting_Evidence: Knoxville is a city of Tennessee.}<|eot_id|>

What I want is not repeating the question, directly give me the answer

{<|start_header_id|>assistant<|end_header_id|
1. state: Tennessee
2. Confidence_Score: 5
3. Supporting_Evidence: Knoxville is a city of Tennessee.}<|eot_id|>}

Runtime Environment

Model: [eg: meta-llama-3-11b-instruct]
Using via huggingface?: [yes]
OS: [eg. Linux/Ubuntu]
GPU VRAM: GPU A30
Number of GPUs: 1
GPU Make: Nvidia

Thank You

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Llama3.2 Image model to only test text input and do not repeat questions #361

Use Llama3.2 Image model to only test text input and do not repeat questions #361

Fujiaoji commented Nov 16, 2024

Use Llama3.2 Image model to only test text input and do not repeat questions #361

Use Llama3.2 Image model to only test text input and do not repeat questions #361

Comments

Fujiaoji commented Nov 16, 2024

Describe the bug

Runtime Environment