We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi there! It's possible to train VLM for example Qwen2-VL-7B-Instruct but only for text? Using traditional Instruction/Input/Output datasets?
I noticed:
model = FastVisionModel.get_peft_model( model, finetune_vision_layers = False, finetune_language_layers = True, finetune_attention_modules = True, finetune_mlp_modules = True, r = 128, lora_alpha = 32, lora_dropout = 0, bias = "none", random_state = 3407, use_rslora = True, loftq_config = None, # target_modules = "all-linear", )
But even passing False to finetune_vision_layers it requires images:
ValueError: Could not make batched images from ['<|im_start|>system\n<|enable_fast_answers|><|im_end|>\n<|im_start|>user\n...']
Full code:
from unsloth import FastVisionModel import torch model, tokenizer = FastVisionModel.from_pretrained( "/ors/tmp/Qwen2.5-VL-14B-Instruct", load_in_4bit = True, use_gradient_checkpointing = "unsloth", ) model = FastVisionModel.get_peft_model( model, finetune_vision_layers = False, finetune_language_layers = True, finetune_attention_modules = True, finetune_mlp_modules = True, r = 128, lora_alpha = 32, lora_dropout = 0, bias = "none", random_state = 3407, use_rslora = True, loftq_config = None, # target_modules = "all-linear", ) from datasets import load_dataset aura_prompt = """<|im_start|>system <|enable_fast_answers|><|im_end|> <|im_start|>user {}<|im_end|> <|im_start|>assistant {}""" def formatting_prompts_func(examples): inputs = examples["input"] outputs = examples["text"] formatted_outputs = [] for input_text, output_text in zip(inputs, outputs): text = aura_prompt.format(f"{input_text}", output_text) + "<|im_end|>" formatted_outputs.append(text) return { "text": formatted_outputs } dataset = load_dataset("kaykyramos/aura-identity", split="train") dataset = dataset.map(formatting_prompts_func, batched=True) print(dataset[0]['text']) from unsloth import is_bfloat16_supported from unsloth import UnslothTrainer, UnslothTrainingArguments from datasets import concatenate_datasets concatenate = concatenate_datasets([dataset]) concatenate = concatenate.shuffle(seed=161800) trainer = UnslothTrainer( model=model, tokenizer=tokenizer, train_dataset=concatenate, dataset_text_field="text", max_seq_length=1024 * 32, dataset_num_proc=24, packing=False, args=UnslothTrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=2, save_steps=250, max_steps=525, warmup_ratio=0.05, num_train_epochs=1, learning_rate=5e-5, embedding_learning_rate=1e-5, # max_grad_norm = 0.3, fp16=not is_bfloat16_supported(), bf16=is_bfloat16_supported(), logging_steps=1, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="cosine", seed=161800, output_dir="/ors/models/LLM/continued-pretrain/outputs", ), ) trainer_stats = trainer.train(resume_from_checkpoint=False) model.save_pretrained("/ors/models/LLM/continued-pretrain/lora") tokenizer.save_pretrained("/ors/models/LLM/continued-pretrain") model.save_pretrained_merged("/ors/models/LLM/continued-pretrain", tokenizer, save_method = "merged_16bit",)
The text was updated successfully, but these errors were encountered:
Oh that's weird I'll back to you on that one. https://docs.unsloth.ai/basics/vision-fine-tuning
Sorry, something went wrong.
I encountered the same problem. Did you solved the problem? In your code, maybe the formatting_prompts_func function was not suitable.
Nope :(
I'm still facing the same issue... I need a continued pretrain to my extended model: https://huggingface.co/orion-research/Qwen2-VL-16B-DepthWise
Unless Unsloth support it, I'll need to pretrain full model with about 16x H100 95GB.
I'lll try to understand this code better, in case of success, I'll return here to give you a feedback.
No branches or pull requests
Hi there! It's possible to train VLM for example Qwen2-VL-7B-Instruct but only for text? Using traditional Instruction/Input/Output datasets?
I noticed:
But even passing False to finetune_vision_layers it requires images:
Full code:
The text was updated successfully, but these errors were encountered: