title

booktitle

year

volume

series

month

publisher

pdf

url

openreview

abstract

layout

issn

id

tex_title

firstpage

lastpage

page

order

cycles

bibtex_editor

editor

bibtex_author

author

date

address

container-title

genre

issued

extras

When and How to Grow? On Efficient Pre-training via Model Growth

Proceedings of the 16th Asian Conference on Machine Learning

2025

260

Proceedings of Machine Learning Research

0

PMLR

https://raw.githubusercontent.com/mlresearch/v260/main/assets/wang25a/wang25a.pdf

https://proceedings.mlr.press/v260/wang25a.html

DDI359KC7v

The remarkable performance of GPT models has attracted widespread attention for large-scale language models. Despite their stunning performance, the huge pre-training cost is prohibitive. Progressive pre-training takes advantage of the faster convergence speed of small models to save computing overhead and shows great potential in accelerating pre-training. This work studies the two key issues in progressive pre-training: growth schedule and growth operation. First, we estimate the optimal growth point in theory. Then, we find in experiments that the growth operation can be performed after the model enters the convergence stage to achieve a high speed-up ratio. On the other hand, we propose progressive dimensionality growth for width expansion and redundant layers for depth expansion. Progressive dimensionality growth is a smoothed operation and improves training stability. Redundant layers implement function-preserving at a small cost and inherit the core parameters of adjacent layers, improving the utilization of knowledge learned by the original model. Our method follows strict function preservation and produces good training dynamics. Experimental results show that our method outperforms the baselines and achieves an acceleration rate of about 1.5 times while achieving the same training effect.

inproceedings

2640-3498

wang25a

When and How to Grow? On Efficient Pre-training via Model Growth

95

110

95-110

95

false

Nguyen, Vu and Lin, Hsuan-Tien

given	family
Vu	Nguyen

given	family
Hsuan-Tien	Lin

Wang, Jikai and Li, Juntao and Zhang, Min and Li, Zechang and Xia, Qingrong and Duan, Xinyu and Wang, Zhefeng and Huai, Baoxing

given	family
Jikai	Wang

given	family
Juntao	Li

given	family
Min	Zhang

given	family
Zechang	Li

given	family
Qingrong	Xia

given	family
Xinyu	Duan

given	family
Zhefeng	Wang

given	family
Baoxing	Huai

2025-01-14

Proceedings of the 16th Asian Conference on Machine Learning

inproceedings

date-parts

2025

1

14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2025-01-14-wang25a.md

2025-01-14-wang25a.md

Files

2025-01-14-wang25a.md

Latest commit

History

2025-01-14-wang25a.md

File metadata and controls