You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like the Tennis env e.g., the return infos only inlcude the lives, is this the valid live for a RPG game? Im confused about the meanning of the returned value.
Also, Im interested in how to check who is the winner for the sport game? There is only DONE flag to check the end of the game.
Thanks.
The text was updated successfully, but these errors were encountered:
An agent only wants to maximise its rewards, for symmetric competitive games then it is normally just the sum of rewards, if positive, then your agent wins, otherwise, the opponent wins
BTW, Im testing the PPO algo with Tennis env, I found the rewards increased to -1 then stop to rise. Is it means that my agent lose the game always? It looks like the policy is converted to a local optimal strategy. However, the logs show that the env always stop at 99999 step. Im curious that is there any maximum step limitation for the env?
Also, Is there any ways to evaluate the trained model? Or rendering the trained frame to figure out the real performance of the agent?
From the looks of it, then the optimal solution is a positive value https://arxiv.org/pdf/1710.02298.pdf see page 11
I would look at some rendering of the agent playing the environment to understand what is happening
Question
Like the Tennis env e.g., the return infos only inlcude the lives, is this the valid live for a RPG game? Im confused about the meanning of the returned value.
Also, Im interested in how to check who is the winner for the sport game? There is only DONE flag to check the end of the game.
Thanks.
The text was updated successfully, but these errors were encountered: