You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to implement QLoRA fine-tuning for a model with dtype=Float32 based on PEFT. When I load the base model using from_pretrained("PATH", BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 )), without setting torch_dtype, the model's dtype changes to Float16, and the obtained last_hidden_state also becomes Float16. However, when I set torch_dtype=Float32, both the model's dtype and last_hidden_state remain Float32. But when I wrap the quantized model with prepare_model_for_kbit_training(), everything changes back to Float32. I would like to know if using the prepare_model_for_kbit_training() function causes bnb_4bit_compute_dtype and torch_dtype to become ineffective. Additionally, I would like to ask when it is necessary to set prepare_model_for_kbit_training(). Furthermore, what determines the impact on the base model's and last_hidden_state's data types? Thank you.
The text was updated successfully, but these errors were encountered:
I want to implement QLoRA fine-tuning for a model with dtype=Float32 based on PEFT. When I load the base model using
from_pretrained("PATH", BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ))
, without setting torch_dtype, the model's dtype changes toFloat16
, and the obtained last_hidden_state also becomes Float16. However, when I settorch_dtype=Float32
, both the model's dtype and last_hidden_state remainFloat32
. But when I wrap the quantized model withprepare_model_for_kbit_training()
, everything changes back toFloat32
. I would like to know if using theprepare_model_for_kbit_training()
functioncauses bnb_4bit_compute_dtype
andtorch_dtype
to become ineffective. Additionally, I would like to ask when it is necessary to setprepare_model_for_kbit_training()
. Furthermore, what determines the impact on the base model's and last_hidden_state's data types? Thank you.The text was updated successfully, but these errors were encountered: