You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First epoch it consumed 2GiB Ram, and second it consumed 5GiB, then 10GiB and finally my memory was full at 11th epoch. (My computer have 32 GiB of ram)
This issue disappeared when I commented out line 156 to 171 in train.lua(The ram usage is always at 1.2GiB)
ifminMeanError==nilorerrors:mean() <minMeanErrorthenprint("\n(Saving model ...)")
params, gradParams=nil,nilcollectgarbage()
-- Model is saved as CPUmodel:float()
torch.save("data/model.t7", model)
collectgarbage()
ifoptions.cudathenmodel:cuda()
elseifoptions.openclthenmodel:cl()
endcollectgarbage()
minMeanError=errors:mean()
end
So I conclude the saving process may be the problem
The text was updated successfully, but these errors were encountered:
Seems to occur in the calls to model:float(). My workaround was to just save in GPU format:
ifminMeanError==nilorerrors:mean() <minMeanErrorthenprint("\n(Saving model ...)")
params, gradParams=nil,nilcollectgarbage()
torch.save("data/model.t7", model)
collectgarbage()
minMeanError=errors:mean()
end
I then added require 'cudnn' to the top of eval.lua in order to be able to load the saved model. If you want to save the model in CPU format, you could write a quick script to load the model, call model:float(), and save it again.
And it doesn't free the memory. I executed this bash command
First epoch it consumed 2GiB Ram, and second it consumed 5GiB, then 10GiB and finally my memory was full at 11th epoch. (My computer have 32 GiB of ram)
This issue disappeared when I commented out line 156 to 171 in train.lua(The ram usage is always at 1.2GiB)
So I conclude the saving process may be the problem
The text was updated successfully, but these errors were encountered: