[Enhancement] From-scratch model pre-training #19

athewsey · 2022-07-12T14:54:22Z

This sample currently demonstrates:

Fine-tuning existing models for downstream tasks (NER), and
Continuation pre-training with unlabelled data from an existing model checkpoint.

From-scratch pre-training is considerably more resource-intensive. For example the LayoutXLM paper describes using 64 V100 GPUs (i.e. 8x p3.16xlarge or p3dn.24xlarge instances for several hours) over ~30M documents.

However, some users may still be interested in from-scratch pre-training - especially for low-resource languages or specialised domains - if tested example code was available. Please drop a 👍 or a comment if this is an enhancement that you'd find useful!

The text was updated successfully, but these errors were encountered:

athewsey added the enhancement New feature or request label Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] From-scratch model pre-training #19

[Enhancement] From-scratch model pre-training #19

athewsey commented Jul 12, 2022

[Enhancement] From-scratch model pre-training #19

[Enhancement] From-scratch model pre-training #19

Comments

athewsey commented Jul 12, 2022