Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on 1_simple_pg.py code #392

Open
luxapana opened this issue Apr 24, 2023 · 1 comment
Open

Clarification on 1_simple_pg.py code #392

luxapana opened this issue Apr 24, 2023 · 1 comment

Comments

@luxapana
Copy link

Hi
Sorry I had to create an issue to get this clarified. Please let me know if there are other forums where help is provided.

In 1_simple_pg.py which implement simple PG, we pass the states and actions of the whole batch to the function compute_loss. The loss we must calculate should be of the form:

u = 0;
for each trail {
     u += P(trail | Theta) * R(trail)
}
loss = u / #of trails

However what it calculates inside compute_loss seem to be something else:

for each item in the batch {
    x = P(a|s) * R(trail)
}
loss = X / #of items in the batch

I am new to this and also to pytorch, so my understanding above may not be correct.

Could some one clarify above please?

Thanks

@luxapana
Copy link
Author

I later realized that both of the above are correct as log of a multiplication of probabilities is same as addition of individual log probabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant