(For training on your own input file) run this command first to analyze tokens within input.txt and generate corresponding token files (train.bin and val.bin). These will be by default under the custom folder. You need to
move or copy these files to under custom_char folder.
python data/custom/prepare.py input.txt
Then, run this command to commence training cycles. Adjust parameters as needed.
python train.py config/train_custom.py --device=cuda --compile=False --eval_iters=20 --log_interval=1 --block_size=64 --batch_size=12 --n_layer=4 --n_head=4 --n_embd=128 --max_iters=2000 --lr_decay_iters=2000 --dropout=0.0
After training process finishes, run this command to obtain samples of generated text:
python sample.py --out_dir=out-custom-char
To calculate BLEU and ROUGE metrics, run the following command (might take a while)
python metrics.py abstractsCLEAN.txt out/filename.txt
The tables below record run time for different parameter combinations as well as samples of the generated text. The BLEU score (4-gram) and ROUGE score (2-gram & Longest common subsequence) reflects how similar the generated text is compared with the reference input on a scale of 0 to 1.
Iterations | Block Size | Time | Result | BLEU-4 | ROUGE-2 | ROUGE-L |
---|---|---|---|---|---|---|
2000 | 64 | 1:45 | output 1 | 0.145 | 0.135 | 0.301 |
4000 | 64 | 3:24 | output 2 | 0.174 | 0.135 | 0.325 |
10000 | 64 | 8:16 | output 3 | 0.244 | 0.172 | 0.331 |
40000 | 64 | 32:25 | output 4 | 0.164 | 0.180 | 0.328 |
Iterations | Block Size | Time | Result | BLEU-4 | ROUGE-2 | ROUGE-L |
---|---|---|---|---|---|---|
10000 | 64 | 8.16 | output 5 | 0.244 | 0.172 | 0.331 |
10000 | 128 | 14:20 | output 6 | 0.202 | 0.151 | 0.310 |
10000 | 256 | 59:00 | output 7 | 0.169 | 0.144 | 0.307 |