Skip to content

Multi-gpu/distributed training script in Tensorflow 1.x.

Notifications You must be signed in to change notification settings

simtony/tf_multigpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Introduction

This repo contains several muti-gpu/distributed training implementation in Tensorflow.

It serve as a good starter training script:

  • allreduce_local_sync.py: synchronized update with allreduce. Recommended.
  • ps_local_sync.py: synchronized update with parameter server. This is recommended by Tensorflow documentation, though it is the least efficient.
  • ps_distribute.py: synchronized/asynchronized update with parameter server. Distributed implementation.

There is a complementary article written in Chinese: https://zhuanlan.zhihu.com/p/50116885

Usage

The code does not consumes any data: it generates fake data and fits them. The code shall be ran with Tensorflow >= 1.12. For Tensorflow 2.0, distributed strategy is recommended. Make sure you have multiple gpus :)

Simply run

python allreduce_local_sync.py --gpus 0,1 --max_step 10000
python ps_local_sync.py --gpus 0,1 --max_step 10000

For ps_distribute.py, make sure you have 2 GPU, and run the following commands in order:

# Cluster configuration is specified in the code, in this case, 1 ps and 2 worker
# First start ps:
$ python ps_distribute.py --job ps --index 0

# Then start the workers:
$ python ps_distribute.py --job worker --index 0 --gpu 0 --max_step 10000
$ python ps_distribute.py --job worker --index 1 --gpu 1 --max_step 10000

For benchmarking, typically for different communication/computation ratio, simply modify the complexity of the model.

About

Multi-gpu/distributed training script in Tensorflow 1.x.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages