Skip to content

Discuss: How do DLRover improve the training performace of fundational models.

Qinlong Wang edited this page Aug 17, 2023 · 5 revisions

How do DLRover improve the training performace of fundational model.

Optimization of a training job.

  • Automatic profiling and diagnosis of training performance.

  • Automatic adjustment of hyper-parameter about throughput and resource, like the micro-batch size per GPU, the num_workers of dataloader.

  • Automatic wrap policy of FSDP to improve the performance of tensor parallelism.

  • Automatic configuration of tensor and pipeline parallelism.

  • Autotunning hyper-params, https://www.deepspeed.ai/tutorials/autotuning/

  • Colossal-auto https://arxiv.org/pdf/2302.02599.pdf

Optimization of a GPU cluster.

Reference: The introduction of model parallelism https://huggingface.co/transformers/v4.9.2/parallelism.html.