Questions Regarding OPT Model Output #758

srhouyu · 2024-08-18T11:00:09Z

Greetings,

I am currently using Python 3.11 and transformers version 4.44.0.

While experimenting with the OPT models (125M, 350M, and 1.5B), I have noticed that the outputs often consist of repetitive and unrelated sentences. I am unsure if I am using the models correctly.

Here is the code I used with the pipeline() function

from transformers import OPTForCausalLM
from transformers import GPT2TokenizerFast
from transformers import pipeline

model_name = 'facebook/opt-1.3b'
cache_dir = './models'
pretrained_model: OPTForCausalLM = OPTForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
tokenizer: GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained(model_name, cache_dir=cache_dir)
generator = pipeline(task='text-generation', model=pretrained_model, tokenizer=tokenizer, device=0)

prompt = "Paris is the capital"
output = generator(prompt, truncation=True, max_length=100)
print(output[0]['generated_text'])

The output I received was:

Paris is the capital of France.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being serious.
I'm being serious.  Paris is the capital of France.  It's not the capital of the world.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being serious.
I'm not sure if you're being sarcastic or not, but I'm not sure if you

I also tried manual generation by inputting embeddings directly (not token ids), as I plan to experiment with prefix-tuning later. However, I still encountered a lot of repetitions:

import torch
from transformers import OPTForCausalLM
from transformers import GPT2TokenizerFast
from transformers import pipeline

model_name = 'facebook/opt-1.3b'
cache_dir = './models'
pretrained_model: OPTForCausalLM = OPTForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
tokenizer: GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained(model_name, cache_dir=cache_dir)

prompt = "Paris is the capital"
max_length = 100

prompt_ids = torch.LongTensor(tokenizer.encode(prompt))
pretrained_model.eval()
end_id = 50260 # The 'endoftext' token
out_token_ids = []
with torch.no_grad():
    prompt_embedding = pretrained_model.get_decoder().embed_tokens(prompt_ids)
    input_embedding = prompt_embedding.unsqueeze(0)    
    # First generation without KV Cache
    output = pretrained_model(inputs_embeds=input_embedding, use_cache=True)
    past_key_values = output.past_key_values
    next_token_id = output.logits[:, -1, :].argmax(dim=-1).unsqueeze(-1)
    out_token_ids.append(next_token_id.item())
    # Generation with KV Cache
    for i in range(max_length - 1):
        output = pretrained_model(input_ids=next_token_id, use_cache=True, past_key_values=past_key_values)
        past_key_values = output.past_key_values
        next_token_id = output.logits[:, -1, :].argmax(dim=-1).unsqueeze(-1)
        if next_token_id == end_id:
            break
        out_token_ids.append(next_token_id.item())

text = tokenizer.decode(out_token_ids)
print(prompt + text)

The output is:

Paris is the capital of France, and the city is a cultural and historical treasure. It is also a city that is full of surprises.

The city is full of museums, galleries, and monuments. It is also full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

Then changing model_name from 1.3b to 125m, the output is a bit like the pipeline version, talking about 'sarcastic':

Paris is the capital of the French Republic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic

I have a few questions:

Am I using the modle correctly?
Why do the two scripts generate different outputs?
Is it normal to see such repetitions in the generated text?
Does the model output the endoftext token (id 50260) at all?

Thank you for your assistance!

Best regards

The text was updated successfully, but these errors were encountered:

srhouyu added the question Further information is requested label Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions Regarding OPT Model Output #758

Questions Regarding OPT Model Output #758

srhouyu commented Aug 18, 2024

Questions Regarding OPT Model Output #758

Questions Regarding OPT Model Output #758

Comments

srhouyu commented Aug 18, 2024