mesh-transformer-mpi

This repo is to track progress of building Tensorflow with MPI support.

The goal is to ensure Tensorflow maps network communication operations to MPI calls, such that the ops used by mesh-tensorflow get mapped to MPI calls as well.

./common-problems.txt: Common problems I've hit while trying to against MPI. Directions to changing tensorflow/configure.py ./tests: Tests for making sure calls are actually using mpi ./runscripts: Launch scripts for Mesh-Tensorflow

Step 1) Setup Configure

-> git clone tensorflow
-> git checkout r1.13
-> If using Cray-MPICH, cp configure.py tensorflow/
-> If using OpenMPI, use default configure.py

Step 2) Build it

-> Edit the paths in buildit.sh to point to the correct MPI dirs and bazel temp dirs
-> run buildit.sh like: ./buildit.sh
-> Answer the questions about the path to Python
-> Answer Yes (y) to the question regarding MPI build support, no additional input should be required for that item if you setup the path correctly

Step 3) Fix common issues are having undeclared dependencies for MPI headers -> Most can be fixed by editing the bazel build files: -> tensorflow/contrib/mpi/BUILD -> tensorflow/contrib/mpi_collectives/BUILD -> Declare dependencies where needed. This has only been necessary when headers outside of mpi.h are required (e.g. MPICH)

Step 4) Once the build completes, do: -> pip install mesh-tensorflow --user -> cd /tests -> Edit runit.sh to point to the correct directories (requires srun / SLURM) -> If you build on Cray XC, the perftools-lite module should be loaded -> Once the test run completes, perftools should produce a profile which shows MPI calls map to tensorflow ops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mesh-transformer-mpi

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
env		env
mesh		mesh
README.md		README.md
buildit.sh		buildit.sh
common-problems.txt		common-problems.txt
configure.py		configure.py
runit.sh		runit.sh

jbalma/mesh-transformer-mpi

Folders and files

Latest commit

History

Repository files navigation

mesh-transformer-mpi

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages