Thank you for making a pull request to awesome-machine-learning
in Software Underground!
Please check that the paper you are submitting is fully reproducible. If it is not, please add some text to indicate why the criterion doesn't seem important in this case.
- Open access (available, licensed; non-negotiable)
- Open code (available, licensed)
- Open data (available, licensed)
The following criteria are judgment calls, but a reasonably attentive reading of the paper (more than 5 minutes, but less than an hour) should be enough to check the following:
- Task seems reasonable and is clearly expressed
- Dataset seems basically sound and appropriate to the task
- No obvious signs of leakage, such as improper splitting
- Data pipeline does sensible things about imputation and scaling
- Algorithm choice seems reasonable and appropriate
- Cross validation strategy seems reasonable
- Hyperparameter tuning steps are explained
- Performance metrics are explicit and seem reasonable
The following are not deal breakers, but if none of these characteristics are present, maybe the paper is not that awesome.
- The provided code runs...
- ...on the provided data...
- ...and produces the results in the paper.
- There are modules and/or functions...
- ...and the non-obvious ones have docstrings.
- There are tests...
- ...and they pass.
- There is some kind of deployment path (e.g. a web server)...
- ...and it works.
- There is a README file.
- Data files use standard formats.
- There is no personally identifable information in the data.
Please feel free to add other notes or review in the sections below.