Skip to content

Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics work

Notifications You must be signed in to change notification settings

sparticlesteve/climate-seg-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning Climate Segmentation Benchmark

Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics codebase here: https://github.com/azrael417/ClimDeepLearn, and the paper: https://arxiv.org/abs/1810.01993

Dataset

The dataset for this benchmark comes from CAM5 [1] simulations and is hosted at NERSC. The samples are stored in HDF5 files with input images of shape (768, 1152, 16) and pixel-level labels of shape (768, 1152). The labels have three target classes (background, atmospheric river, tropical cycline) and were produced with TECA [2].

The current recommended way to get the data is to use GLOBUS and the following globus endpoint:

https://app.globus.org/file-manager?origin_id=0b226e2c-4de0-11ea-971a-021304b0cca7&origin_path=%2F

The dataset folder contains a README with some technical description of the dataset and an All-Hist folder containing all of the data files.

Unfortunately we don't yet have the dataset split into train/val/test nor a recommended procedure for doing the split yourself. You can for now do uniform splitting using a script similar to what is here:

https://gist.github.com/sparticlesteve/8a3e81a31e89fd1cccc81a3fae3fcf2d

Previous dataset for ECP Annual Meeting 2019

This is a smaller dataset (~200GB total) available to get things started. It is hosted via Globus:

https://app.globus.org/file-manager?origin_id=bf7316d8-e918-11e9-9bfc-0a19784404f4&origin_path=%2F

and also available via https:

https://portal.nersc.gov/project/dasrepo/deepcam/climseg-data-small/

How to run the benchmark

Submission scripts are in run_scripts.

Running at NERSC

To submit to the Cori KNL system, do

# This example runs on 64 nodes.
cd run_scripts
sbatch -N 64 train_cori.sh

To submit to the Cori GPU system, do

# 8 ranks per node, 1 per GPU
module purge
module load esslurm
cd run_scripts
sbatch -N 4 train_corigpu.sh

References

  1. Wehner, M. F., Reed, K. A., Li, F., Bacmeister, J., Chen, C.-T., Paciorek, C., Gleckler, P. J., Sperber, K. R., Collins, W. D., Gettelman, A., et al.: The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5. 1, Journal of Advances in Modeling Earth Systems, 6, 980-997, 2014.
  2. Prabhat, Byna, S., Vishwanath, V., Dart, E., Wehner, M., Collins, W. D., et al.: TECA: Petascale pattern recognition for climate science, in: International Conference on Computer Analysis of Images and Patterns, pp. 426-436, Springer, 2015b.

About

Reference implementation for the climate segmentation benchmark, based on the Exascale Deep Learning for Climate Analytics work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published