Train Text Only for VLMs #1436

kaykyr · 2024-12-16T19:39:59Z

Hi there! It's possible to train VLM for example Qwen2-VL-7B-Instruct but only for text? Using traditional Instruction/Input/Output datasets?

I noticed:

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r = 128,
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = True,
    loftq_config = None,
    # target_modules = "all-linear",
)

But even passing False to finetune_vision_layers it requires images:

ValueError: Could not make batched images from ['<|im_start|>system\n<|enable_fast_answers|><|im_end|>\n<|im_start|>user\n...']

Full code:

from unsloth import FastVisionModel
import torch

model, tokenizer = FastVisionModel.from_pretrained(
    "/ors/tmp/Qwen2.5-VL-14B-Instruct",
    load_in_4bit = True,
    use_gradient_checkpointing = "unsloth",
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r = 128,
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = True,
    loftq_config = None,
    # target_modules = "all-linear",
)

from datasets import load_dataset

aura_prompt = """<|im_start|>system
<|enable_fast_answers|><|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
{}"""

def formatting_prompts_func(examples):
    inputs = examples["input"]
    outputs = examples["text"]
    formatted_outputs = []

    for input_text, output_text in zip(inputs, outputs):
        text = aura_prompt.format(f"{input_text}", output_text) + "<|im_end|>"
        formatted_outputs.append(text)
    
    return { "text": formatted_outputs }

dataset = load_dataset("kaykyramos/aura-identity", split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)

print(dataset[0]['text'])

from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

from datasets import concatenate_datasets
concatenate = concatenate_datasets([dataset])
concatenate = concatenate.shuffle(seed=161800)

trainer = UnslothTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=concatenate,
    dataset_text_field="text",
    max_seq_length=1024 * 32,
    dataset_num_proc=24,
    packing=False,
    args=UnslothTrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=2,
        save_steps=250,
        max_steps=525,
        warmup_ratio=0.05,
        num_train_epochs=1,
        learning_rate=5e-5,
        embedding_learning_rate=1e-5,
        # max_grad_norm = 0.3,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=161800,
        output_dir="/ors/models/LLM/continued-pretrain/outputs",
    ),
)

trainer_stats = trainer.train(resume_from_checkpoint=False)

model.save_pretrained("/ors/models/LLM/continued-pretrain/lora")
tokenizer.save_pretrained("/ors/models/LLM/continued-pretrain")
model.save_pretrained_merged("/ors/models/LLM/continued-pretrain", tokenizer, save_method = "merged_16bit",)

The text was updated successfully, but these errors were encountered:

shimmyshimmer · 2024-12-18T08:20:11Z

Oh that's weird I'll back to you on that one.
https://docs.unsloth.ai/basics/vision-fine-tuning

michaelzhouy · 2024-12-23T04:00:29Z

I encountered the same problem. Did you solved the problem?
In your code, maybe the formatting_prompts_func function was not suitable.

kaykyr · 2024-12-23T16:46:42Z

I encountered the same problem. Did you solved the problem? In your code, maybe the formatting_prompts_func function was not suitable.

Nope :(

I'm still facing the same issue... I need a continued pretrain to my extended model:
https://huggingface.co/orion-research/Qwen2-VL-16B-DepthWise

Unless Unsloth support it, I'll need to pretrain full model with about 16x H100 95GB.

I'lll try to understand this code better, in case of success, I'll return here to give you a feedback.

shimmyshimmer added the unsure bug? I'm unsure label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Text Only for VLMs #1436

Train Text Only for VLMs #1436

kaykyr commented Dec 16, 2024

shimmyshimmer commented Dec 18, 2024

michaelzhouy commented Dec 23, 2024

kaykyr commented Dec 23, 2024

Train Text Only for VLMs #1436

Train Text Only for VLMs #1436

Comments

kaykyr commented Dec 16, 2024

shimmyshimmer commented Dec 18, 2024

michaelzhouy commented Dec 23, 2024

kaykyr commented Dec 23, 2024