Open
Description
Hello,
I have an issue with multiple GPU performance.
- I use the recipe
lora_finetune_single_device
with the configmini_lora_single_device.yaml
on 6000ADA, I got ~5it/s - I use the recipe
lora_finetune_distributed
with the configmini_lora.yaml
on 2 x 6000ADA, I got 1.5s/it
The dataset that I used to fine-tune is HuggingFaceFW/fineweb-edu-score-2
How can I improve the performance in multiple GPU?
Metadata
Metadata
Assignees
Labels
No labels