You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with the default setting num_cpu= 16 , I ran out of my 40G RAM and process was killed by system.
sudo cat /var/log/syslog | grep -i "killed"
kernel: [ 1522.255350] Out of memory: Killed process 15384 (python) total-vm:54111924kB, anon-rss:35495356kB, file-rss:72320kB, shmem-rss:14336kB, UID:0 pgtables:76780kB oom_score_adj:0
The text was updated successfully, but these errors were encountered:
Garbage123King
changed the title
Is "16 cores and ~20G of RAM" in readme.md a mistake? It makes me confused.
Is "16 cores and ~20G of RAM" in README.md a mistake? It makes me confused.
Dec 7, 2023
I just found that, if I start with a new folder, then I will use less memory, because it began training every 2.5k steps.
But if I start with a old exists folder, then I will use 50+ GB memory at the last traning moment. It start training every 20480 steps.
Garbage123King
changed the title
Is "16 cores and ~20G of RAM" in README.md a mistake? It makes me confused.
when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists?
Dec 9, 2023
with the default setting
num_cpu= 16
, I ran out of my 40G RAM and process was killed by system.sudo cat /var/log/syslog | grep -i "killed"
The text was updated successfully, but these errors were encountered: