-
Notifications
You must be signed in to change notification settings - Fork 166
Discuss: How do DLRover improve the training performace of fundational models.
Qinlong Wang edited this page Aug 17, 2023
·
5 revisions
-
Automatic profiling and diagnosis of training performance.
-
Automatic adjustment of hyper-parameter about throughput and resource, like the micro-batch size per GPU, the num_workers of dataloader.
-
Automatic wrap policy of FSDP to improve the performance of tensor parallelism.
-
Automatic configuration of tensor and pipeline parallelism.
-
Autotunning hyper-params, https://www.deepspeed.ai/tutorials/autotuning/
-
Colossal-auto https://arxiv.org/pdf/2302.02599.pdf
Reference: The introduction of model parallelism https://huggingface.co/transformers/v4.9.2/parallelism.html.