gcrmn-benthic-classification

Creating a global reef map

The goal of this repository is to develop models that can generate a global reef map for applied uses. We use deep learning and creative data and modeling choices to mitigate non-representative training data, only three feature bands, lack of NIR and depth data, and single images per location, among other limitations. The code base has been built up iteratively, with new information and changing requirements, the need for efficiency in spinning up and applying models, accounting for varying levels of progress in jobs, etc. For example: many models can be trained, validated, and summarized with a single command; jobs can be started and restarted without losing progress or interfering with concurrent jobs; global appications to terabytes of data can be done quickly by throwing more GPUs at the task.

Data management via `data_acquisition` and `data_cleaning` modules

The data_acquistion module downloads training and calval data from various locations. The data_cleaning module formats and cleans the data for use in downstream models. The nature of the data and models and the project requirements have changed over time, so all scripts are relatively specific, each performing usually only one or two tasks. It would be difficult to modify the data pipeline if the scripts were less modular. Due to the amount of data involved, it's often necessary to assume that new data will be processed and avoid reprocessing existing files, or to parallelize scripts for remote computing resources.

Model training via `config` and `model_training` modules

The config module generates data and model configs compatible with the bfg-nets package that Phil Brodrick and I have developed. The configs specify data characteristics like where the feature and response files are located, how many samples should be generated, and how to scale and format the built data. The configs also specify model characteristics such as what type of network architecture to use, how to create the network architecture, and how to train the model. The model_training module uses these configs to build data and train models, using scripts to handle how to create and run jobs on the SLURM system.

Model validation via `application_calval` module

The application_calval module uses trained models to generate maps at calval locations, and generates statistics and reports to quantify the performance of each model. Additional reports can be generated to compare models to one another and to various baseline maps.

Model validation via `application_global` module

Select models are applied to global imagery to generate global reef maps. Like other modules, the code is written in such a way that arbitrary numbers of jobs can be working on the global map concurrently, with code to lock an image when one job is working on it, to keep track of which images have valid reef area and which images have no reef area (i.e., all land or water), handle images that may be corrupt or missing, etc. A helpful script queries the GCS buckets to determine the application progress.

Name		Name	Last commit message	Last commit date
Latest commit History 644 Commits
gcrmnbc		gcrmnbc
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gcrmn-benthic-classification

Creating a global reef map

Data management via `data_acquisition` and `data_cleaning` modules

Model training via `config` and `model_training` modules

Model validation via `application_calval` module

Model validation via `application_global` module

About

Releases

Packages

Languages

nsfabina/gcrmn-benthic-classification

Folders and files

Latest commit

History

Repository files navigation

gcrmn-benthic-classification

Creating a global reef map

Data management via data_acquisition and data_cleaning modules

Model training via config and model_training modules

Model validation via application_calval module

Model validation via application_global module

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Data management via `data_acquisition` and `data_cleaning` modules

Model training via `config` and `model_training` modules

Model validation via `application_calval` module

Model validation via `application_global` module

Packages