-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing published score #41
Comments
@htdt Thanks, we'll fix that typo! That might have creeped in when we were cleaning up the code! On the results front, the F1 score you get after running the code is the mean across all probes. The F1 score we report (and mention) in the paper is the mean across every category (we do this to avoid certain categories dominating the mean F1 score). The averaging code is a post-processing step right now (see Table 2 at https://github.com/ankeshanand/atari-representation-learning/blob/master/notebooks/summarize_results.ipynb) but we should try to integrate it in the main code, or be explicit about the reported score! Thanks for pointing this out as well! |
@eracah Can we report the category-wise mean after running probes as well? Would be less confusing to people! |
I have added a temporary warning here: d8412cb |
clear now, makes sense! |
I think pull request #42 will fix this issue. @ankeshanand, can you check it and merge it in |
Nvm, I merged it. Should be solved now. @htdt, feel free to reopen if you have any issues |
Thank you very much for sharing this implementation!
There is a minor issue, as far as i understand, here should be
steps=args.pretraining_steps
.I tried to run the experiment with
python -m scripts.run_probe --method infonce-stdim --env-name MsPacmanNoFrameskip-v4 --pretraining-steps 100000
couple of times, mean f1 is ≈0.65. In the paper score is 0.7. Other methods also show a slightly lower score, although "supervised" one matches exactly. Am I missing some hyperparameter or it's just a typical score fluctuation?The text was updated successfully, but these errors were encountered: