Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Llama3.2 Image model to only test text input and do not repeat questions #361

Open
Fujiaoji opened this issue Nov 16, 2024 · 0 comments

Comments

@Fujiaoji
Copy link

Describe the bug

Hi, I am using the Llama-3.2-11B-Vision-Instruct to text the text input. I have two questions:

  1. I am not sure if it is correct to directly use the image model to test text in this way, so I wanna double check.
  2. The output always repeat the input_text_info for the response, I wanna the model directly give me the answers rather than repeat the info in the response.

My goal is that ask the model some questions by prompts, based on my descriptions (not the prompt), then the model give me the answer. Do not repeat the questions again. For example, the prompt is like this

suppose you are an expert of history, please answer the following questions based on the given text
1. state: <response>
4. Confidence_Score: <How confident are you when identifying the brand on a scale of 1 to 5, 5 being absolutely confident, 1 being not confident>
5. Supporting_Evidence:

The context is like this and very long:

Knoxville is a city of Tennessee, ......, and so on.

The example code looks like this, I set the image=None, the input_text_info and the former prompt + the context. Is this correct?:

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
    pretrained_model_name_or_path=model_id,
    torch_dtype=torch.bfloat16,
    device_map=f"cuda:{gpu_id}",
    cache_dir="XXX/Llama/cache/"
)
processor = AutoProcessor.from_pretrained(model_id, cache_dir="XX/Llama/cache/")

messages = [{"role": "user", "content": [{"type": "text", "text": input_text_info}]}, {"role": "assistant", "content": ""}]
input_text = processor.apply_chat_template(messages, format = "json", add_generation_prompt=True, return_full_text=False)
inputs = processor(images=None, text=input_text, add_special_tokens=False, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=100)
answer = processor.decode(output[0]) # skip_special_tokens=True, remove_input=True
responce_info.append(answer)

The output of the code looks like this

{
       "<|start_header_id|>user<|end_header_id|>\n\nsuppose you are an expert of history, please answer the following questions based on the given text
1. state: <response>
2. Confidence_Score: <How confident are you when identifying the brand on a scale of 1 to 5, 5 being absolutely confident, 1 being not confident>
3. Supporting_Evidence:
Knoxville is a city of Tennessee, ......, and so on.
<|start_header_id|>assistant<|end_header_id|
1. state: Tennessee
2. Confidence_Score: 5
3. Supporting_Evidence: Knoxville is a city of Tennessee.}<|eot_id|>

What I want is not repeating the question, directly give me the answer

{<|start_header_id|>assistant<|end_header_id|
1. state: Tennessee
2. Confidence_Score: 5
3. Supporting_Evidence: Knoxville is a city of Tennessee.}<|eot_id|>}

Runtime Environment

  • Model: [eg: meta-llama-3-11b-instruct]
  • Using via huggingface?: [yes]
  • OS: [eg. Linux/Ubuntu]
  • GPU VRAM: GPU A30
  • Number of GPUs: 1
  • GPU Make: Nvidia

Thank You

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant