[Question] When I reproduced the second stage using finetune.sh, the processing time per image was too slow #1790

TanmouTT · 2024-12-05T06:23:26Z

Question

I run finetune.sh on 8xA100(40G),it takes about 33 seconds per image.
Maybe the picture download is not complete, I skipped the missing picture, I think this may not be the reason why the training is so slow
here is the finetune.sh I use：
#!/bin/bash

deepspeed llava/train/train_mem.py
--deepspeed ./scripts/zero3.json
--model_name_or_path /home/24-zhangtan/LLaVA/vicuna-7b-v1.5
--version v1
--data_path /home/24-zhangtan/LLaVA/playground/data/llava_v1_5_mix665k.json
--image_folder /home/24-zhangtan/LLaVA/playground/data
--vision_tower /home/24-zhangtan/LLaVA/clip-vit-large-patch14-336
--pretrain_mm_mlp_adapter /home/24-zhangtan/LLaVA/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin
--mm_projector_type mlp2x_gelu
--mm_vision_select_layer -2
--mm_use_im_start_end False
--mm_use_im_patch_token False
--image_aspect_ratio pad
--group_by_modality_length True
--bf16 True
--output_dir ./checkpoints/llava-v1.5-7b
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 1
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--tf32 True
--model_max_length 2048
--gradient_checkpointing True
--dataloader_num_workers 4
--lazy_preprocess True
--report_to wandb

Has anyone had this problem? How did you solve it? Looking forward to receiving reply！

quanyouyou · 2024-12-05T13:48:13Z

So do i, and my : --per_device_train_batch_size 12
and it will oom when the bs is set to 16 ,8*A100(40G)

so puzzled😵‍💫

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] When I reproduced the second stage using finetune.sh, the processing time per image was too slow #1790

[Question] When I reproduced the second stage using finetune.sh, the processing time per image was too slow #1790

TanmouTT commented Dec 5, 2024

quanyouyou commented Dec 5, 2024

[Question] When I reproduced the second stage using finetune.sh, the processing time per image was too slow #1790

[Question] When I reproduced the second stage using finetune.sh, the processing time per image was too slow #1790

Comments

TanmouTT commented Dec 5, 2024

Question

quanyouyou commented Dec 5, 2024