Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pong Policy Gradient-important error in the definition of the convolutional net. #79

Open
TomaszRem opened this issue Apr 1, 2018 · 1 comment

Comments

@TomaszRem
Copy link

TomaszRem commented Apr 1, 2018

I tried to run Pong Policy Gradient for 2000 episodes on the original file with no results whatsoever. Then boosted reward for positive points (points scored by the learner(right side) to 20 and got this result:
pong_reinforce_v1 02x20x-1
I boosted learner's points rewards to 100 and after around 1500 episodes got a slight improvement, similar to that in the picture. I ran it to 8100 episodes and there was no improvement except for a slightly smaller variance. Forgive my being naive but successfully running three versions of cartpole I was expecting some logical results.
As you can see from the picture variance is big and after a 800-900 improvement the results seem stagnant.
Has anybody run it for some more episodes and tried to tweak the rewards and brought results up and variance down?
Given the policy should I boost the penalty for the teacher's (left opponent's) scoring points?
Any guidance will be appreciated. Thanks.

@TomaszRem TomaszRem changed the title Pong Policy Gradient-ow many episodes to get results? Pong Policy Gradient-How many episodes to get result? Apr 1, 2018
@TomaszRem TomaszRem reopened this Apr 2, 2018
@TomaszRem
Copy link
Author

TomaszRem commented Apr 5, 2018

I found the reason behind my issue. Convolutional part of the neural net was wrongly defined that's why it converged to a negative result.
Based on my earlier experience with the convolutional networks I changed the following:
model.add(Reshape((1, 80, 80), input_shape=(self.state_size,)))
to
model.add(Reshape((80,80, 1), name="Layer1",input_shape=(self.state_size,)))
and
removed strides in strides=(3, 3) to the default one.
The first change reshaped the network correctly to have 80 by 80 windows not 1 by 80 windows the second change was necessary because without it the network was loosing some information and early converging and not exploring any more.
Now the network looks like this:
net_pong_reinforce_v1
and after only 1000 episodes it mostly wins although with a high variance and shows a bias to stay in the lower part of the screen. It either needs more training or redefinition of the act function.
pong_reinforce_v1 02 01to1050x1x-1
I made some more changes to the structure to speed things up because on my laptop with 1,8 mln weights it was very slow.

@TomaszRem TomaszRem changed the title Pong Policy Gradient-How many episodes to get result? Pong Policy Gradient-important error in the definition of the convolutional net. Apr 5, 2018
@TomaszRem TomaszRem reopened this Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant