`BUG` VPG: `pi_info_old` seems to be the same as `pi_info` #424

tesla-cat · 2024-10-20T12:35:14Z

refering to this part from VPG

        # Get loss and info values before update
        pi_l_old, pi_info_old = compute_loss_pi(data)                   # line A
        pi_l_old = pi_l_old.item()
        v_l_old = compute_loss_v(data).item()

        # Train policy with a single step of gradient descent
        pi_optimizer.zero_grad()
        loss_pi, pi_info = compute_loss_pi(data)                           # line B
        loss_pi.backward()
        mpi_avg_grads(ac.pi)    # average grads across MPI processes
        pi_optimizer.step()                                                            # line C

i think the parameter updates happen at line C right? therefore the NN params didn't change between line A and line B, so pi_info_old should be the same as pi_info
similarly, policy params, obs, act all didn't change when calculating logp_old and logp, so shouldn't they be the same?

    def compute_loss_pi(data):
        obs, act, adv, logp_old = data['obs'], data['act'], data['adv'], data['logp']

        # Policy loss
        pi, logp = ac.pi(obs, act)
        loss_pi = -(logp * adv).mean()

        # Useful extra info
        approx_kl = (logp_old - logp).mean().item()
        ent = pi.entropy().mean().item()
        pi_info = dict(kl=approx_kl, ent=ent)

        return loss_pi, pi_info

The text was updated successfully, but these errors were encountered:

tesla-cat · 2024-10-21T08:32:35Z

after looking at the PPO implementation, one can confirm that this indeed doesnt make sense when you only train policy for one iter, it only makes sense with multiple iters of policy updates as seen in PPO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`BUG` VPG: `pi_info_old` seems to be the same as `pi_info` #424

`BUG` VPG: `pi_info_old` seems to be the same as `pi_info` #424

tesla-cat commented Oct 20, 2024

tesla-cat commented Oct 21, 2024

BUG VPG: pi_info_old seems to be the same as pi_info #424

BUG VPG: pi_info_old seems to be the same as pi_info #424

Comments

tesla-cat commented Oct 20, 2024

tesla-cat commented Oct 21, 2024

I made a repo to share the cleaned, minimalist version of the spinup implementations

https://github.com/tesla-cat/minRL

`BUG` VPG: `pi_info_old` seems to be the same as `pi_info` #424

`BUG` VPG: `pi_info_old` seems to be the same as `pi_info` #424