the model before lora load and after lora load is diff #16

danieltanhx · 2023-08-31T16:38:18Z

the model in point 1 and point 2 shown below is diff, i've compared their respective generated text.. it's really different.

1.just aft 4bit training->gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)

2.model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)

danieltanhx · 2023-09-01T08:54:08Z

found the bug "PLS remove model = model.merge_and_unload() and reuse the original 4bit base model instead of the fp16 base model". Details is in artidoro/qlora#254

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
model = PeftModel.from_pretrained(base_model, new_model)
#model = model.merge_and_unload() MUST remove due to artidoro/qlora#254

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the model before lora load and after lora load is diff #16

the model before lora load and after lora load is diff #16

danieltanhx commented Aug 31, 2023

danieltanhx commented Sep 1, 2023 •

edited

Loading

the model before lora load and after lora load is diff #16

the model before lora load and after lora load is diff #16

Comments

danieltanhx commented Aug 31, 2023

danieltanhx commented Sep 1, 2023 • edited Loading

danieltanhx commented Sep 1, 2023 •

edited

Loading