Skip to content

Latest commit

 

History

History
137 lines (94 loc) · 6.82 KB

README.md

File metadata and controls

137 lines (94 loc) · 6.82 KB

Logo

Overview

The geo-deep-learning project stems from an initiative at NRCan's CCMEO. Its aim is to allow using Convolutional Neural Networks (CNN) with georeferenced datasets.

In geo-deep-learning, the learning process comprises two broad stages: sampling and training, followed by inference, which makes use of a trained model to make new predictions on unseen imagery.

Data sampling (or tiling)

The data preparation phase creates chips (or patches) that will be used for either training, validation or testing. The sampling step requires a csv as input with a list of rasters and labels to be used in the subsequent training phase. See dataset documentation.

Training, along with validation and testing

The training phase is where the neural network learns to use the data prepared in the previous phase to make all the predictions. The crux of the learning process is the training phase.

  • Samples labeled "trn" as per above are used to train the neural network.
  • Samples labeled "val" are used to estimate the training error (i.e. loss) on a set of sub-images not used for training, after every epoch.
  • At the end of all epochs, the model with the lowest error on validation data is loaded and samples labeled "tst", if they exist, are used to estimate the accuracy of the model on sub-images unseen during training or validation.

Inference

The inference phase allows the use of a trained model to predict on new input data. The final step in the process is to assign every pixel in the original image a value corresponding to the most probable class.

Requirement

This project comprises a set of commands to be run at a shell command prompt. Examples used here are for a bash shell in an Ubuntu GNU/Linux environment.

The system can be used on your workstation or cluster.

Installation

To execute scripts in this project, first create and activate your python environment with the following commands:

conda env create -f environment.yml
conda activate geo_deep_env

Tested on Ubuntu 20.04 and Windows 10 using miniconda.

Running GDL

This is an example of how to run GDL with hydra in simple steps with the massachusetts buildings dataset in the tests/data/ folder, for segmentation on buildings:

  1. Clone this github repo.
git clone https://github.com/NRCan/geo-deep-learning.git
cd geo-deep-learning
  1. Run the wanted script (for segmentation).
# Creating the hdf5 from the raw data
python GDL.py mode=tiling
# Training the neural network
python GDL.py mode=train
# Inference on the data
python GDL.py mode=inference

This example runs with a default configuration ./config/gdl_config_template.yaml. For further examples on configuration options see the configuration documentation.

If you want to introduce a new task like object detection, you only need to add the code in the main folder and name it object_detection_sampling.py for example. The principle is to name the code like {task}_{mode}.py and the GDL.py will deal with the rest. To run it, you will need to add a new parameter in the command line python GDL.py mode=sampling task=object_detection or change the parameter inside the ./config/gdl_config_template.yaml.

Folder Structure

We suggest the following high level structure to organize the images and the code.

├── {dataset_name}
    └── data
        └── RGB_tiff
            └── {3 band tiff images}
        └── RGBN_tiff
            └── {4 band tiff images}
        └── gpkg
            └── {GeoPackages}
    └── images.csv
├── geo-deep-learning
    └── {scripts as cloned from github}

Don't forget to change the path of the dataset in the config yaml.

Note: For more information on a subject, go to the specific directory, a README.md is provided with all the information and the explanations related to the code.