Replies: 2 comments 6 replies
-
Epoch is the equivalent of training steps for the GUI: images * epochs = training steps, at least for batch size 1 anyway. When you hit train, it trains for how many epochs you selected, regardless of what it has trained in the past. Or that's been my experience at least. I train for 40,000 steps at minimum when I train, so I don't understand how how you're having trouble going past 1800. I usually just put in 1000 epochs, save checkpoint every X epochs, then go to sleep/work and compare the checkpoints later. If I need it to train more, I just hit train again. I don't think the gui has training steps selectable anymore. But I could be wrong. Maybe try updating the extension? |
Beta Was this translation helpful? Give feedback.
-
As @minienglish1 has pointed out, previous training steps are not taken into account, they are just for informational purposes. This could be a VRAM issue. When this extension encounters OOM, it will try to reduce the batch size. With a batch size of 1, it used to reduce it to 0, and then concluded there was nothing to be done (and so training was "finished"). I am pretty sure this was changed a few weeks ago (because that was pretty confusing behavior), but I am not sure if this has landed in main branch already. If you encounter this again, post the log and your training settings here, and what type of GPU you are using. |
Beta Was this translation helpful? Give feedback.
-
I'm using iterational approach in my trainings using small number of Training Steps Per Image with each iteration, starting another one if I am not pleased with the outcome. E.g starting with 80 s/i, after training is complete I can run another 20 etc.
But somehow now I can not start training. I got model with 1800 steps, I'm trying to start another 20 steps iteration, but it won't happen - It starts and stops, displaying a message "Training finished. Total lifetime steps: 1800"
So, is there a way to set max steps to -1 or really high number to allow me to run the iterations as long as I need?
Beta Was this translation helpful? Give feedback.
All reactions