how to use model.generate with smoothquant models #82

Hao-YunDeng · 2024-03-31T09:05:05Z

I did

import torch
from transformers import GPT2Tokenizer
from smoothquant.opt import Int8OPTForCausalLM

tokenizer = GPT2Tokenizer.from_pretrained('facebook/opt-6.7b')
model_smoothquant = Int8OPTForCausalLM.from_pretrained('mit-han-lab/opt-6.7b-smoothquant', torch_dtype=torch.float16, device_map='auto').to('cuda')

text = "The quick brown fox"
input_ids = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512).input_ids.to('cuda')

generated_ids = model_smoothquant.generate(input_ids, max_length=32)

but got

ValueError: The provided attention mask has length 21, but its length should be 32 (sum of the lengths of current and past inputs)

Does anyone know how to correctly use generator of smoothquant models?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use model.generate with smoothquant models #82

how to use model.generate with smoothquant models #82

Hao-YunDeng commented Mar 31, 2024

how to use model.generate with smoothquant models #82

how to use model.generate with smoothquant models #82

Comments

Hao-YunDeng commented Mar 31, 2024