Skip to content

Latest commit

 

History

History
73 lines (73 loc) · 2.71 KB

2025-01-14-wang25a.md

File metadata and controls

73 lines (73 loc) · 2.71 KB
title booktitle year volume series month publisher pdf url openreview abstract layout issn id tex_title firstpage lastpage page order cycles bibtex_editor editor bibtex_author author date address container-title genre issued extras
When and How to Grow? On Efficient Pre-training via Model Growth
Proceedings of the 16th Asian Conference on Machine Learning
2025
260
Proceedings of Machine Learning Research
0
PMLR
DDI359KC7v
The remarkable performance of GPT models has attracted widespread attention for large-scale language models. Despite their stunning performance, the huge pre-training cost is prohibitive. Progressive pre-training takes advantage of the faster convergence speed of small models to save computing overhead and shows great potential in accelerating pre-training. This work studies the two key issues in progressive pre-training: growth schedule and growth operation. First, we estimate the optimal growth point in theory. Then, we find in experiments that the growth operation can be performed after the model enters the convergence stage to achieve a high speed-up ratio. On the other hand, we propose progressive dimensionality growth for width expansion and redundant layers for depth expansion. Progressive dimensionality growth is a smoothed operation and improves training stability. Redundant layers implement function-preserving at a small cost and inherit the core parameters of adjacent layers, improving the utilization of knowledge learned by the original model. Our method follows strict function preservation and produces good training dynamics. Experimental results show that our method outperforms the baselines and achieves an acceleration rate of about 1.5 times while achieving the same training effect.
inproceedings
2640-3498
wang25a
When and How to Grow? On Efficient Pre-training via Model Growth
95
110
95-110
95
false
Nguyen, Vu and Lin, Hsuan-Tien
given family
Vu
Nguyen
given family
Hsuan-Tien
Lin
Wang, Jikai and Li, Juntao and Zhang, Min and Li, Zechang and Xia, Qingrong and Duan, Xinyu and Wang, Zhefeng and Huai, Baoxing
given family
Jikai
Wang
given family
Juntao
Li
given family
Min
Zhang
given family
Zechang
Li
given family
Qingrong
Xia
given family
Xinyu
Duan
given family
Zhefeng
Wang
given family
Baoxing
Huai
2025-01-14
Proceedings of the 16th Asian Conference on Machine Learning
inproceedings
date-parts
2025
1
14