Skip to content

Elastic Deep Learning using PaddlePaddle and Kubernetes

License

Notifications You must be signed in to change notification settings

m3ngyang/edl

This branch is 242 commits behind elasticdeeplearning/edl:develop.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6bac249 · Mar 15, 2018
Feb 9, 2018
Mar 8, 2018
Mar 15, 2018
Mar 15, 2018
Mar 8, 2018
Mar 8, 2018
Mar 15, 2018
Mar 8, 2018
Mar 8, 2018
Mar 8, 2018
Mar 8, 2018
Mar 15, 2018
Mar 15, 2018

Repository files navigation

PaddlePaddle EDL: Elastic Deep Learning

While many hardware and software manufacturers are working on improving the running time of deep learning jobs, EDL optimizes

  1. the global utilization of the cluster, and
  2. the waiting time of job submitters.

For more about the project EDL, please refer to this invited blog post on the Kubernetes official blog.

EDL includes two parts:

  1. a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and

  2. making PaddlePaddle a fault-tolerable deep learning framework. This directory contains the Kubernetes controller. For more information about fault-tolerance, please refer to the design.

We deployed EDL on a real Kubernetes cluster, dlnel.com, opened for graduate students of Tsinghua University. The performance test report of EDL on this cluster is here.

Build

glide install --strip-vendor
go build -o path/to/output github.com/paddlepaddle/edl/cmd/edl

About

Elastic Deep Learning using PaddlePaddle and Kubernetes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 99.0%
  • Shell 1.0%