Explore the Data. Read an overview of the project. Read a paper detailing the work
There are three core domains in this repo: Model Training, Site Detection, and Metadata Generation and Site Monitoring. Reference the GPW Pipeline Diagram for information on how pipeline components are related. These scripts are run through a series of notebooks.
The code is known to run on Python >= 3.7
$ python -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt
Imports are given relative to the repo base directory, which therefore must be on PYTHONPATH. Either in your .bash_profile or at the end of the file env/bin/activate add: export PYTHONPATH=/path/to/plastics:$PYTHONPATH
The bulk-processing pipeline runs on Descartes Labs. Authorization on your local is handled via the command line helper function:
$ descarteslabs auth login
Follow the link to enter user email and password. If needed, see further instructions from Descartes.
- Create pixel and patch Dataset.
create_spectrogram_dataset.ipynb
- Train pixel classifier.
train_spectrogram_classifier.ipynb
- Train weakly-labeled patch classifier.
train patch classifier.ipynb
The outputs of model training are a temporal pixel and a patch classifier. These are stored in the models directory.
- Generate population-limited area.
generate_populated_dltiles.ipynb
- Deploy pixel + patch classifier inference on Descartes.
descartes_spectrogram_run_withpop.ipynb
- Detect candidates on Descartes.
descartes_candidate_detect.ipynb
- Filter pixel candidates by patch classifier intersection.
query_patch_classifier.ipynb
The end product of this site detection flow is a set of intersection candidate points stored in the candidate_sites/{model_version}
directory. These sites are then manually validated, and saved in the sampling_locations
directory.
- Generate metadata for confirmed sites.
generate_metadata.ipynb
- Generate contours for confirmed sites on Descartes.
descartes_contour_run.ipynb
These outputs are pushed directly to the API and are not stored locally.