How to modify rewards configuration? #86

apsarode · 2024-03-17T17:48:08Z

apsarode
Mar 17, 2024

I am trying to implement some of Stable-baselines-3 models on tmrl and wanted to get some advice from others who have used tmrl.

How are others modifying how rewards are returned? I've struggled to find much documentation on how reward is returned from tmrl. I know that config.json contains some reward configurations, but could use some help what some of the parameters mean (i.e how often does the CONSTANT_PENALTY get returned, when is CHECK_FORWARD being returned, etc.)

yannbouteiller · 2024-03-20T16:02:13Z

yannbouteiller
Mar 20, 2024
Maintainer

We should definitely document the parameters in config.json better.

You can find how the reward is computed here.

CONSTANT_PENALTY is present at each step, but it defaults to 0 as a similar effect is achieved by gamma-discounting.

CHECK_FORWARD is the distance (number of forward "reward checkpoints") used to check whether the car has taken a shortcut, the higher this value, the longer it is possible to cut, at the cost of higher computation required to compute the reward.

MAX_STRAY is the maximum distance the car is allowed to stray away from "reward checkpoints", after which the episode gets terminated (because the car strayed too far away from the track).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to modify rewards configuration? #86

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

How to modify rewards configuration? #86

apsarode Mar 17, 2024

Replies: 1 comment

yannbouteiller Mar 20, 2024 Maintainer

apsarode
Mar 17, 2024

yannbouteiller
Mar 20, 2024
Maintainer