To avoid dependency issues, each of the baseline methods has a separate environment (Docker, conda, pipenv).
In this repository, each of the baseline methods is independent from the evaluation process.
To run the existing baselines we chose, follow the instruction in each directory.
Once you finish training and obtaining the estimated equations, you will follow the evaluation instruction here to evaluate your model.
You will write a script to
- load tabular datasets such as our SRSD datasets
- train your model on each of the datasets
- choose the best model per dataset (e.g., based on regression errors on validation dataset after hyperparameter tuning)
- dump the estimated symbolic expression (equation)
As a starting point, we suggest that you make a copy of gplearn/
folder and edit the project to work with your own model.
gplearn/gp_runner.py
is a minimal script to execute Steps 1 - 4 above for gplearn baseline.
Once you estimate equations from the datasets, you will follow the evaluation instruction here and evaluate your model on the true equation and/or test dataset, using the estimated equation obtained at Step 4.