In this project I employ the methods of machine learning to build models predicting deforestation based on data from from [1] and [2]. Moreover I use various methods of estimation how individual variables contribute to the outcome, including the cutting-edge Shapley values analysis. The goal is not to reproduce the results of the mentioned papers and perform a meticulous statistical analysis but rather to see how the modern tools of data science can supplement statistical approaches of environmental science.
See the accompanying blog post: https://patryk-kubiczek.github.io/posts/pacific-deforestation/.
[1] Rolett, B. & Diamond, J. Environmental predictors of pre-European deforestation on Pacific islands. Nature 431, 443 (2004).
[2] Atkinson, Q. D., Coomber, T., Passmore, S., Greenhill, S. J. & Kushnick, G. Cultural and Environmental Predictors of Pre-European Deforestation on Pacific Islands. PLOS ONE 11, e0156340 (2016).
See the notebook analyse.ipynb
in nbviewer.
- Install the required Python modules from env.yaml. For example, if you use Anaconda:
conda env create --name pacific-deforestation -f env.yml
conda activate pacific-deforestation
-
Run the notebook
merge-data.ipynb
which calls scripts for downloading and cleaning the data, and eventually creates the final dataset. -
Run the notebook
analysis.ipynb
. Execution of some cells may take a while.
I acknowledge the usage of public datasets supplementing papers [1] and [2], as well as of a map from Natural Earth.
Copyright (C) 2019 Patryk Kubiczek