when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? #148

Garbage123King · 2023-12-07T15:45:25Z

with the default setting num_cpu= 16 , I ran out of my 40G RAM and process was killed by system.

sudo cat /var/log/syslog | grep -i "killed"

kernel: [ 1522.255350] Out of memory: Killed process 15384 (python) total-vm:54111924kB, anon-rss:35495356kB, file-rss:72320kB, shmem-rss:14336kB, UID:0 pgtables:76780kB oom_score_adj:0

The text was updated successfully, but these errors were encountered:

Garbage123King · 2023-12-07T16:24:46Z

I just found that, if I start with a new folder, then I will use less memory, because it began training every 2.5k steps.
But if I start with a old exists folder, then I will use 50+ GB memory at the last traning moment. It start training every 20480 steps.

Garbage123King · 2023-12-09T09:27:40Z

file_name = 'session_e41c9eff/poke_38207488_steps'

if exists(file_name + '.zip'):
    print('\nloading checkpoint')
    model = PPO.load(file_name, env=env)
    model.n_steps = ep_length      #should this be ep_length // 8 ? Or it is on purpose?
    model.n_envs = num_cpu
    model.rollout_buffer.buffer_size = ep_length
    model.rollout_buffer.n_envs = num_cpu
    model.rollout_buffer.reset()
else:
    model = PPO('CnnPolicy', env, verbose=1, n_steps=ep_length // 8, batch_size=128, n_epochs=3, gamma=0.998, tensorboard_log=sess_path)

Garbage123King changed the title ~~Is "16 cores and ~20G of RAM" in readme.md a mistake? It makes me confused.~~ Is "16 cores and ~20G of RAM" in README.md a mistake? It makes me confused. Dec 7, 2023

Garbage123King closed this as completed Dec 7, 2023

Garbage123King reopened this Dec 7, 2023

Garbage123King changed the title ~~Is "16 cores and ~20G of RAM" in README.md a mistake? It makes me confused.~~ when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? Dec 9, 2023

bodiya mentioned this issue Dec 14, 2023

set model params consistently on reload #151

Open

Garbage123King closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? #148

when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? #148

Garbage123King commented Dec 7, 2023

Garbage123King commented Dec 7, 2023

Garbage123King commented Dec 9, 2023

when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? #148

when "file_name + .zip" exists, should "model.n_steps" be ep_length // 8, as small as not exists? #148

Comments

Garbage123King commented Dec 7, 2023

Garbage123King commented Dec 7, 2023

Garbage123King commented Dec 9, 2023