-
Notifications
You must be signed in to change notification settings - Fork 15
Conversation
global_step = tf.contrib.framework.get_or_create_global_step() | ||
opt = tf.train.GradientDescentOptimizer(0.5) | ||
# Add MPI Distributed Optimizer | ||
#import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pub
line can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
parser.add_argument('--log_dir',type=str, default='/tmp/tensorflow/mnist/logs/mnist_with_summaries', | ||
help='Summaries log directory') | ||
FLAGS, unparsed = parser.parse_known_args() | ||
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the command line to run this as a comment in the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an example run command for distributed runs
src/ngraph_encapsulate_op.cc
Outdated
@@ -39,6 +39,10 @@ | |||
|
|||
#include "ngraph/runtime/interpreter/int_backend.hpp" | |||
|
|||
#ifdef NGRAPH_DISTRIBUTED | |||
#include <mpi.h>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>
. Is this a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
src/ngraph_rewrite_pass.cc
Outdated
@@ -29,6 +29,10 @@ | |||
|
|||
#include <iomanip> | |||
|
|||
#ifdef NGRAPH_DISTRIBUTED | |||
#include <mpi.h>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>
is this a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for catching this!
help='Summaries log directory') | ||
FLAGS, unparsed = parser.parse_known_args() | ||
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) | ||
# run comand for this distributed script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# run comand for this distributed script | |
# run command for this distributed script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. Fixed it.
…aSystems/ngraph-tf into langjian/distributed_json
build_ngtf.py
Outdated
@@ -512,6 +512,7 @@ def main(): | |||
"-DNGRAPH_TUNE_ARCH=" + target_arch, | |||
"-DNGRAPH_ARTIFACTS_DIR=" + artifacts_location, | |||
"-DUNIT_TEST_ENABLE=ON", | |||
"-DNGRAPH_DISTRIBUTED_ENABLE=TRUE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default should be false i.e., for regular builds no need to enable distributed. Need to add a command line option to the build_ngtf.py (e.g., --enable_distributed
) which when specified will add this flag. See the --debug_build as an example of how to extend the cmake options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the option of --distributed_build
for distributed build
See extensive documentation at | ||
https://www.tensorflow.org/get_started/mnist/beginners | ||
""" | ||
from __future__ import absolute_import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a summary of modifications made to this file and a reference to the source of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listed the changes I made from the source file and added the link to the reference source of this file
src/ngraph_encapsulate_op.cc
Outdated
@@ -36,6 +36,10 @@ | |||
|
|||
#include "ngraph/runtime/interpreter/int_backend.hpp" | |||
|
|||
#ifdef NGRAPH_DISTRIBUTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use the if defined(NGRAPH_DISTRIBUTED)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to #if defined NGRAPH_DISTRIBUTED
src/ngraph_rewrite_pass.cc
Outdated
@@ -102,13 +106,33 @@ class NGraphRewritePass : public GraphOptimizationPass { | |||
static std::string GraphFilenamePrefix(std::string kind, int idx) { | |||
std::stringstream ss; | |||
ss << kind << "_" << std::setfill('0') << std::setw(4) << idx; | |||
#ifdef NGRAPH_DISTRIBUTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a function that can be used instead of the code repetition below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the distributed class defined in nGraph core
…aSystems/ngraph-tf into langjian/distributed_json
CMakeLists.txt
Outdated
include_directories(SYSTEM ${MPI_C_INCLUDE_PATH} ${MPI_CXX_INCLUDE_PATH}) | ||
link_directories(${MPI_C_LIBRARIES} ${MPI_CXX_LIBRARIES}) | ||
endif() | ||
#if(NGRAPH_DISTRIBUTED_ENABLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this temporary? If not please remove the commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back to include the MPI build
…aSystems/ngraph-tf into langjian/distributed_json
TESTNOW |
No description provided.