You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried training from scratch as explained in readme.
Training / Fine-tuning
pip install deepspeed==0.7.0 // pip install pytorch-lightning==1.9.5 // torch 1.13.1+cu117
NOTE: add weight decay (0.1 or 0.01) and dropout (0.1 or 0.01) when training on small amt of data. try x=x+dropout(att(x)) x=x+dropout(ffn(x)) x=dropout(x+att(x)) x=dropout(x+ffn(x)) etc.
Training RWKV-4 from scratch: run train.py, which by default is using the enwik8 dataset (unzip https://data.deepai.org/enwik8.zip).
I changed the n_epoch = 500 to n_epoch = 500 to test the training functionality. But the log keeps going on beyond that as shown below. Is there a way to train for a shorter number of epochs or any other change in configuration?
miniE 1 s 833 prog 20.00% : ppl 5.406813 loss 1.687660 lr 3.330213e-04: 100%|██████████| 833/833 [03:21<00:00, 4.13it/s] miniE 2 s 1666 prog 40.00% : ppl 3.495054 loss 1.251349 lr 1.386290e-04: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s] miniE 3 s 2499 prog 60.00% : ppl 3.216982 loss 1.168444 lr 5.770800e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s] miniE 4 s 3332 prog 80.00% : ppl 3.105918 loss 1.133309 lr 2.402249e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s] miniE 5 s 4165 prog 100.00% : ppl 3.047873 loss 1.114444 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s miniE 6 s 4998 prog 120.00% : ppl 3.037687 loss 1.111096 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s miniE 7 s 5831 prog 140.00% : ppl 3.021025 loss 1.105596 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 8 s 6664 prog 160.00% : ppl 3.018359 loss 1.104713 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 9 s 7497 prog 180.00% : ppl 3.006846 loss 1.100892 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 10 s 8330 prog 200.00% : ppl 2.985658 loss 1.093820 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/ miniE 11 s 8344 prog 200.34% : ppl 2.969897 loss 1.088527 lr 1.000000e-05: 2%|▏ | 14/833 [00:03<03:15, 4.20it/sminiE 11 s 8344 prog 200.34% : ppl 2.969897 loss 1.088527 lr 1.000000e-05: 2%|▏ | 14/833 [00:03<03:37, 3.76it/s
The text was updated successfully, but these errors were encountered:
Thank you a lot for the response I will also try rwkv v6!
and I would also be grateful if you could provide some guidelines on the following matter #268
"With rwkv-V4, If I wish to make an encoder decoder model for example to be used in translation, what are the hidden states that needs passing between the encoder and the decoder? Can you provide some guideline on this matter or any existing work?"
I want to make an encoder decoder model equivalent to lstm based encoder decoders where the hidden state gets passed to the decoder to inform the decoding process. I would really appreciate if you could provide me with some information on what could be used as the equivalent to hidden states in lstms? Furthermore, it appeared that run and train files seem to be following different architectures. I would also like to know the high level difference between those two
I tried training from scratch as explained in readme.
I changed the
n_epoch = 500
ton_epoch = 500
to test the training functionality. But the log keeps going on beyond that as shown below. Is there a way to train for a shorter number of epochs or any other change in configuration?miniE 1 s 833 prog 20.00% : ppl 5.406813 loss 1.687660 lr 3.330213e-04: 100%|██████████| 833/833 [03:21<00:00, 4.13it/s] miniE 2 s 1666 prog 40.00% : ppl 3.495054 loss 1.251349 lr 1.386290e-04: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s] miniE 3 s 2499 prog 60.00% : ppl 3.216982 loss 1.168444 lr 5.770800e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s] miniE 4 s 3332 prog 80.00% : ppl 3.105918 loss 1.133309 lr 2.402249e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s] miniE 5 s 4165 prog 100.00% : ppl 3.047873 loss 1.114444 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s miniE 6 s 4998 prog 120.00% : ppl 3.037687 loss 1.111096 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.22it/s miniE 7 s 5831 prog 140.00% : ppl 3.021025 loss 1.105596 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 8 s 6664 prog 160.00% : ppl 3.018359 loss 1.104713 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 9 s 7497 prog 180.00% : ppl 3.006846 loss 1.100892 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/s miniE 10 s 8330 prog 200.00% : ppl 2.985658 loss 1.093820 lr 1.000000e-05: 100%|██████████| 833/833 [03:17<00:00, 4.21it/ miniE 11 s 8344 prog 200.34% : ppl 2.969897 loss 1.088527 lr 1.000000e-05: 2%|▏ | 14/833 [00:03<03:15, 4.20it/sminiE 11 s 8344 prog 200.34% : ppl 2.969897 loss 1.088527 lr 1.000000e-05: 2%|▏ | 14/833 [00:03<03:37, 3.76it/s
The text was updated successfully, but these errors were encountered: