Clarification on 1_simple_pg.py code #392

luxapana · 2023-04-24T06:27:59Z

Hi
Sorry I had to create an issue to get this clarified. Please let me know if there are other forums where help is provided.

In 1_simple_pg.py which implement simple PG, we pass the states and actions of the whole batch to the function compute_loss. The loss we must calculate should be of the form:

u = 0;
for each trail {
     u += P(trail | Theta) * R(trail)
}
loss = u / #of trails

However what it calculates inside compute_loss seem to be something else:

for each item in the batch {
    x = P(a|s) * R(trail)
}
loss = X / #of items in the batch

I am new to this and also to pytorch, so my understanding above may not be correct.

Could some one clarify above please?

Thanks

The text was updated successfully, but these errors were encountered:

luxapana · 2023-04-30T19:32:28Z

I later realized that both of the above are correct as log of a multiplication of probabilities is same as addition of individual log probabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on 1_simple_pg.py code #392

Clarification on 1_simple_pg.py code #392

luxapana commented Apr 24, 2023

luxapana commented Apr 30, 2023

Clarification on 1_simple_pg.py code #392

Clarification on 1_simple_pg.py code #392

Comments

luxapana commented Apr 24, 2023

luxapana commented Apr 30, 2023