Replies: 1 comment
-
We should definitely document the parameters in You can find how the reward is computed here. CONSTANT_PENALTY is present at each step, but it defaults to 0 as a similar effect is achieved by gamma-discounting. CHECK_FORWARD is the distance (number of forward "reward checkpoints") used to check whether the car has taken a shortcut, the higher this value, the longer it is possible to cut, at the cost of higher computation required to compute the reward. MAX_STRAY is the maximum distance the car is allowed to stray away from "reward checkpoints", after which the episode gets terminated (because the car strayed too far away from the track). |
Beta Was this translation helpful? Give feedback.
-
I am trying to implement some of Stable-baselines-3 models on tmrl and wanted to get some advice from others who have used tmrl.
How are others modifying how rewards are returned? I've struggled to find much documentation on how reward is returned from tmrl. I know that
config.json
contains some reward configurations, but could use some help what some of the parameters mean (i.e how often does theCONSTANT_PENALTY
get returned, when isCHECK_FORWARD
being returned, etc.)Beta Was this translation helpful? Give feedback.
All reactions