Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark random levels end early #31

Open
JohnBurden opened this issue Nov 1, 2022 · 2 comments
Open

Benchmark random levels end early #31

JohnBurden opened this issue Nov 1, 2022 · 2 comments

Comments

@JohnBurden
Copy link

Hey,

Not sure if this repo is still being maintained, but I'm having some trouble with the benchmarking of the randomly generated levels, e.g., append-still. For some reason the episodes end after 1, 2, or 3 steps with a score of 0. I think something is triggering the `done' flag in the benchmark environments too early.
To reproduce this I installed locally and simply ran
./start-training.py testAgent

This occurs with and without wandb.

Training seems to work fine, so this is likely related to the time limit for benchmark levels or the way the benchmark levels are run.
Any help is massively appreciated!

@clwainwright
Copy link
Collaborator

Hmm, I'm not able to reproduce by running ./start-training.py testAgent. It might be worthwhile to drop the video interval all the way down to 1 so that you can get a clearer view of what's actually happening, and after that you can always set a breakpoint and step through the code to see what's going on. Remember that you can print the board state to the terminal which can help in debugging. Sorry that that's not more helpful!

Could you tell me the details of your setup? Everything compiled ok, presumably? Let me know if you're not able to figure this out tomorrow and I'll try to look into it in more detail.

@JohnBurden
Copy link
Author

Thanks for the help!
I've managed to track down the issue to the fixed benchmark levels in the npz file.
They load ok, and seem to look fine but are returning as done straight away. If I replace the benchmark levels with randomly generated ones using the validation seed then it works as expected.
This suits me better as I was hoping to evaluate on randomly generated levels.

I imagine this might be the result of different versions of numpy or something like that? But at the same time, the levels look normal.

For reference I'm running on Ubuntu 20.04, with python 3.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants