Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Questions Regarding OPT Model Output #758

Open
srhouyu opened this issue Aug 18, 2024 · 0 comments
Open

Questions Regarding OPT Model Output #758

srhouyu opened this issue Aug 18, 2024 · 0 comments
Labels
question Further information is requested

Comments

@srhouyu
Copy link

srhouyu commented Aug 18, 2024

Greetings,

I am currently using Python 3.11 and transformers version 4.44.0.

While experimenting with the OPT models (125M, 350M, and 1.5B), I have noticed that the outputs often consist of repetitive and unrelated sentences. I am unsure if I am using the models correctly.

Here is the code I used with the pipeline() function

from transformers import OPTForCausalLM
from transformers import GPT2TokenizerFast
from transformers import pipeline

model_name = 'facebook/opt-1.3b'
cache_dir = './models'
pretrained_model: OPTForCausalLM = OPTForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
tokenizer: GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained(model_name, cache_dir=cache_dir)
generator = pipeline(task='text-generation', model=pretrained_model, tokenizer=tokenizer, device=0)

prompt = "Paris is the capital"
output = generator(prompt, truncation=True, max_length=100)
print(output[0]['generated_text'])

The output I received was:

Paris is the capital of France.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being serious.
I'm being serious.  Paris is the capital of France.  It's not the capital of the world.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being serious.
I'm not sure if you're being sarcastic or not, but I'm not sure if you

I also tried manual generation by inputting embeddings directly (not token ids), as I plan to experiment with prefix-tuning later. However, I still encountered a lot of repetitions:

import torch
from transformers import OPTForCausalLM
from transformers import GPT2TokenizerFast
from transformers import pipeline

model_name = 'facebook/opt-1.3b'
cache_dir = './models'
pretrained_model: OPTForCausalLM = OPTForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
tokenizer: GPT2TokenizerFast = GPT2TokenizerFast.from_pretrained(model_name, cache_dir=cache_dir)

prompt = "Paris is the capital"
max_length = 100

prompt_ids = torch.LongTensor(tokenizer.encode(prompt))
pretrained_model.eval()
end_id = 50260 # The 'endoftext' token
out_token_ids = []
with torch.no_grad():
    prompt_embedding = pretrained_model.get_decoder().embed_tokens(prompt_ids)
    input_embedding = prompt_embedding.unsqueeze(0)    
    # First generation without KV Cache
    output = pretrained_model(inputs_embeds=input_embedding, use_cache=True)
    past_key_values = output.past_key_values
    next_token_id = output.logits[:, -1, :].argmax(dim=-1).unsqueeze(-1)
    out_token_ids.append(next_token_id.item())
    # Generation with KV Cache
    for i in range(max_length - 1):
        output = pretrained_model(input_ids=next_token_id, use_cache=True, past_key_values=past_key_values)
        past_key_values = output.past_key_values
        next_token_id = output.logits[:, -1, :].argmax(dim=-1).unsqueeze(-1)
        if next_token_id == end_id:
            break
        out_token_ids.append(next_token_id.item())

text = tokenizer.decode(out_token_ids)
print(prompt + text)

The output is:

Paris is the capital of France, and the city is a cultural and historical treasure. It is also a city that is full of surprises.

The city is full of museums, galleries, and monuments. It is also full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

The city is full of surprises.

Then changing model_name from 1.3b to 125m, the output is a bit like the pipeline version, talking about 'sarcastic':

Paris is the capital of the French Republic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic.
I'm not sure if you're being sarcastic or not, but I'm not sure if you're being sarcastic

I have a few questions:

  1. Am I using the modle correctly?
  2. Why do the two scripts generate different outputs?
  3. Is it normal to see such repetitions in the generated text?
  4. Does the model output the endoftext token (id 50260) at all?

Thank you for your assistance!

Best regards

@srhouyu srhouyu added the question Further information is requested label Aug 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant