You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Sorry I had to create an issue to get this clarified. Please let me know if there are other forums where help is provided.
In 1_simple_pg.py which implement simple PG, we pass the states and actions of the whole batch to the function compute_loss. The loss we must calculate should be of the form:
u = 0;
for each trail {
u += P(trail | Theta) * R(trail)
}
loss = u / #of trails
However what it calculates inside compute_loss seem to be something else:
for each item in the batch {
x = P(a|s) * R(trail)
}
loss = X / #of items in the batch
I am new to this and also to pytorch, so my understanding above may not be correct.
Could some one clarify above please?
Thanks
The text was updated successfully, but these errors were encountered:
Hi
Sorry I had to create an issue to get this clarified. Please let me know if there are other forums where help is provided.
In 1_simple_pg.py which implement simple PG, we pass the states and actions of the whole batch to the function compute_loss. The loss we must calculate should be of the form:
However what it calculates inside compute_loss seem to be something else:
I am new to this and also to pytorch, so my understanding above may not be correct.
Could some one clarify above please?
Thanks
The text was updated successfully, but these errors were encountered: