[Feature] Log each entropy for composite distributions in PPO #2707

louisfaury · 2025-01-20T14:45:32Z

Description

This PR enables PPO to log the entropy of each individual head of a composite policy separately.

Concretely, for a composite distribution with, say, a nested discrete and continuous head, the td_out is augmented with some detached values.

>>> loss = PPO(td)
>>> loss.keys()
[..., entropy, discrete-head-entropy, continuous-head-entropy, ...]

Motivation and Context

This is an extremely useful debugging tool when training composite policies.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2025-01-20T14:45:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2707

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchrl/objectives/ppo.py

louisfaury · 2025-01-20T14:55:14Z

torchrl/objectives/ppo.py

+                for head_key, head_entropy in entropy.items(
+                    include_nested=True, leaves_only=True
+                ):
+                    td_out.set("-".join(head_key), head_entropy.detach().mean())


Choosing under which key to log the individual factor's entropy was a bit of a headache. The way I personally use CompositeDistribution yields tensordict that look like:

action: { head_1: { action: ... entropy: ... } head_2 { action: ... entropy: ... } }

which means that using head_key[-1] to log each entropy is not really a viable solution (all the factor entropies will be logged under the same name, entropy). I'm not sure how to get a one-size-fits-all here, and happy for suggestions. The current solution ensures that there is no collision, at the price of having very verbose keys.

Wouldn't the most generic solution just be to log the entropy TD as it comes?
Why do we need to rename it?
BTW it seems to me that what you're doing here amends to

tensordict.flatten_keys("-").detact().mean()

Nit: this isn't collision-safe I think (but flatten_keys will tell you if there are any collision):
eg ("key-one", "entropy") and ("key", "one", "entropy") will collide

also - are we 100% sure all keys are nested? I think so (that's how CompositeDist works) but maybe we could just put a safeguard check here to make sure an error is raised if that assumption is violated (eg, users have their own dist class that returns {"entropy", ("nested", "entropy")} keys).

I followed your recommendation: I added a composite_entropy key to the loss td. Two remarks:

The composite entropy is not logged under entropy to avoid BC (users currently expect a Tensor),

I did not detach() the composite entropy; this would allow the user to compute a custom entropy bonus when using a composite entropy (e.g. not the same penalty per head).

Wdyt ?

Not detaching could be slightly bc-breaking: what if I do loss_tensordict.sum(reduce=True).backward()?
Previously, this was giving the right result, now it would also backprop through the entropy. Usually metadata in loss outputs is guaranteed to be non-differentiable, that would be a one off.
But I understand that it could be useful...
We could add a kwarg in the constructor (which would become a bit overloaded!)

Right. I'll detach for now, let's revisit when there is a need/ask for it?

yes, thanks for raising it!
Thinking about it, a way of doing this could be to pass the entropy coefficient as a tensordict and do
(td_coef * td_entropy).sum(reduce=True)
Idea for a follow-up PR ;)

vmoens

LGTM, just a couple of comments to address before we merge it

LMK what you think would be the best way to log the entropies in the output data structure, I think flattening may be a bit surprising

torchrl/objectives/ppo.py

vmoens · 2025-01-21T13:01:46Z

torchrl/objectives/ppo.py

+                for head_key, head_entropy in entropy.items(
+                    include_nested=True, leaves_only=True
+                ):
+                    td_out.set("-".join(head_key), head_entropy.detach().mean())


Wouldn't the most generic solution just be to log the entropy TD as it comes?
Why do we need to rename it?
BTW it seems to me that what you're doing here amends to

tensordict.flatten_keys("-").detact().mean()

Nit: this isn't collision-safe I think (but flatten_keys will tell you if there are any collision):
eg ("key-one", "entropy") and ("key", "one", "entropy") will collide

torchrl/objectives/ppo.py

vmoens · 2025-01-21T13:05:16Z

torchrl/objectives/ppo.py

+                for head_key, head_entropy in entropy.items(
+                    include_nested=True, leaves_only=True
+                ):
+                    td_out.set("-".join(head_key), head_entropy.detach().mean())


also - are we 100% sure all keys are nested? I think so (that's how CompositeDist works) but maybe we could just put a safeguard check here to make sure an error is raised if that assumption is violated (eg, users have their own dist class that returns {"entropy", ("nested", "entropy")} keys).

vmoens

LGTM thanks!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 20, 2025

louisfaury commented Jan 20, 2025

View reviewed changes

Louis Faury added 2 commits January 21, 2025 11:49

Log entropies

2111d5b

Change head entropy logged name

06c3d94

vmoens force-pushed the lf/ppo-log-composite-entropies branch from 71bd4bd to 06c3d94 Compare January 21, 2025 11:49

vmoens added the enhancement New feature or request label Jan 21, 2025

vmoens reviewed Jan 21, 2025

View reviewed changes

Louis Faury added 2 commits January 23, 2025 09:59

Report entire entropy

0d94932

Detach composite entropy

f29bace

vmoens approved these changes Jan 24, 2025

View reviewed changes

vmoens merged commit 319bb68 into pytorch:main Jan 24, 2025
54 of 62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Log each entropy for composite distributions in PPO #2707

[Feature] Log each entropy for composite distributions in PPO #2707

Uh oh!

louisfaury commented Jan 20, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

louisfaury Jan 20, 2025 •

edited

Loading

Uh oh!

vmoens Jan 21, 2025

Uh oh!

vmoens Jan 21, 2025

Uh oh!

louisfaury Jan 23, 2025

Uh oh!

vmoens Jan 23, 2025

Uh oh!

louisfaury Jan 23, 2025

Uh oh!

vmoens Jan 24, 2025

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

vmoens Jan 21, 2025

Uh oh!

Uh oh!

vmoens Jan 21, 2025

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

Uh oh!

[Feature] Log each entropy for composite distributions in PPO #2707

[Feature] Log each entropy for composite distributions in PPO #2707

Uh oh!

Conversation

louisfaury commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

pytorch-bot bot commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2707

Uh oh!

Uh oh!

Uh oh!

Uh oh!

louisfaury Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

louisfaury commented Jan 20, 2025 •

edited

Loading

pytorch-bot bot commented Jan 20, 2025 •

edited

Loading

louisfaury Jan 20, 2025 •

edited

Loading