Discuss: How do DLRover improve the training performace of fundational models.

How do DLRover improve the training performace of fundational model.

Automatic profiling and diagnosis of training performance.
Automatic adjustment of hyper-parameter about throughput and resource, like the micro-batch size per GPU, the num_workers of dataloader.
Automatic wrap policy of FSDP to improve the performance of tensor parallelism.
Automatic configuration of tensor and pipeline parallelism.
Autotunning hyper-params, https://www.deepspeed.ai/tutorials/autotuning/
Colossal-auto https://arxiv.org/pdf/2302.02599.pdf