Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train Text Only for VLMs #1436

Open
kaykyr opened this issue Dec 16, 2024 · 3 comments
Open

Train Text Only for VLMs #1436

kaykyr opened this issue Dec 16, 2024 · 3 comments
Labels
unsure bug? I'm unsure

Comments

@kaykyr
Copy link

kaykyr commented Dec 16, 2024

Hi there! It's possible to train VLM for example Qwen2-VL-7B-Instruct but only for text? Using traditional Instruction/Input/Output datasets?

I noticed:

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r = 128,
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = True,
    loftq_config = None,
    # target_modules = "all-linear",
)

But even passing False to finetune_vision_layers it requires images:

ValueError: Could not make batched images from ['<|im_start|>system\n<|enable_fast_answers|><|im_end|>\n<|im_start|>user\n...']

Full code:

from unsloth import FastVisionModel
import torch

model, tokenizer = FastVisionModel.from_pretrained(
    "/ors/tmp/Qwen2.5-VL-14B-Instruct",
    load_in_4bit = True,
    use_gradient_checkpointing = "unsloth",
)

model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = False,
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r = 128,
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = True,
    loftq_config = None,
    # target_modules = "all-linear",
)

from datasets import load_dataset

aura_prompt = """<|im_start|>system
<|enable_fast_answers|><|im_end|>
<|im_start|>user
{}<|im_end|>
<|im_start|>assistant
{}"""

def formatting_prompts_func(examples):
    inputs = examples["input"]
    outputs = examples["text"]
    formatted_outputs = []

    for input_text, output_text in zip(inputs, outputs):
        text = aura_prompt.format(f"{input_text}", output_text) + "<|im_end|>"
        formatted_outputs.append(text)
    
    return { "text": formatted_outputs }

dataset = load_dataset("kaykyramos/aura-identity", split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)

print(dataset[0]['text'])

from unsloth import is_bfloat16_supported
from unsloth import UnslothTrainer, UnslothTrainingArguments

from datasets import concatenate_datasets
concatenate = concatenate_datasets([dataset])
concatenate = concatenate.shuffle(seed=161800)

trainer = UnslothTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=concatenate,
    dataset_text_field="text",
    max_seq_length=1024 * 32,
    dataset_num_proc=24,
    packing=False,
    args=UnslothTrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=2,
        save_steps=250,
        max_steps=525,
        warmup_ratio=0.05,
        num_train_epochs=1,
        learning_rate=5e-5,
        embedding_learning_rate=1e-5,
        # max_grad_norm = 0.3,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=161800,
        output_dir="/ors/models/LLM/continued-pretrain/outputs",
    ),
)

trainer_stats = trainer.train(resume_from_checkpoint=False)

model.save_pretrained("/ors/models/LLM/continued-pretrain/lora")
tokenizer.save_pretrained("/ors/models/LLM/continued-pretrain")
model.save_pretrained_merged("/ors/models/LLM/continued-pretrain", tokenizer, save_method = "merged_16bit",)
@shimmyshimmer
Copy link
Collaborator

Oh that's weird I'll back to you on that one.
https://docs.unsloth.ai/basics/vision-fine-tuning

@michaelzhouy
Copy link

I encountered the same problem. Did you solved the problem?
In your code, maybe the formatting_prompts_func function was not suitable.

@shimmyshimmer shimmyshimmer added the unsure bug? I'm unsure label Dec 23, 2024
@kaykyr
Copy link
Author

kaykyr commented Dec 23, 2024

I encountered the same problem. Did you solved the problem? In your code, maybe the formatting_prompts_func function was not suitable.

Nope :(

I'm still facing the same issue... I need a continued pretrain to my extended model:
https://huggingface.co/orion-research/Qwen2-VL-16B-DepthWise

Unless Unsloth support it, I'll need to pretrain full model with about 16x H100 95GB.

I'lll try to understand this code better, in case of success, I'll return here to give you a feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unsure bug? I'm unsure
Projects
None yet
Development

No branches or pull requests

3 participants