You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I trained an adapter for unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit with high rank (1024) for continued pretraining on japanese language with a lot but noisy data. I then merged and pushed the model to hf with push_to_hub_merged and save_method merged_16bit.
I wanted to then train on a second but more high quality dataset, but it turns out the merged model has lost the dynamic quant property. When simply loading this in 4bit and training the results are again terrible just as the other vision models beforehand. When training this merged model in 16bit the quality is kept and the performance is better than a model without continued pretraining.
My question now is, is there a script I missed so we can dyna-quant our own merged models ourselves? Or should we use save_method merged_4bit when merging?
If the answer is the latter, maybe add a user warning to recommend this saving method when using vision models in 4bit mode.
The text was updated successfully, but these errors were encountered:
alright i did a second test also with merging_4bit option and merging_4bit_forced in save_method parameter. Both of them printed out in the console that the model is getting merged as 16bit and both options have the same accuracy loss when trying to load in 4bit.
So we definitly need a dynamic quant script or a new merging method for vlm's
I trained an adapter for unsloth/Qwen2-VL-2B-Instruct-unsloth-bnb-4bit with high rank (1024) for continued pretraining on japanese language with a lot but noisy data. I then merged and pushed the model to hf with push_to_hub_merged and save_method merged_16bit.
I wanted to then train on a second but more high quality dataset, but it turns out the merged model has lost the dynamic quant property. When simply loading this in 4bit and training the results are again terrible just as the other vision models beforehand. When training this merged model in 16bit the quality is kept and the performance is better than a model without continued pretraining.
My question now is, is there a script I missed so we can dyna-quant our own merged models ourselves? Or should we use save_method merged_4bit when merging?
If the answer is the latter, maybe add a user warning to recommend this saving method when using vision models in 4bit mode.
The text was updated successfully, but these errors were encountered: