Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Langjian/distributed json #366

Merged
merged 41 commits into from
Jan 18, 2019
Merged

Langjian/distributed json #366

merged 41 commits into from
Jan 18, 2019

Conversation

jianyinglang
Copy link
Contributor

No description provided.

@jianyinglang jianyinglang changed the title Langjian/distributed json --- WIP [WIP] Langjian/distributed json Dec 6, 2018
@jianyinglang jianyinglang removed the WIP label Dec 21, 2018
@jianyinglang jianyinglang changed the title [WIP] Langjian/distributed json Langjian/distributed json Dec 21, 2018
global_step = tf.contrib.framework.get_or_create_global_step()
opt = tf.train.GradientDescentOptimizer(0.5)
# Add MPI Distributed Optimizer
#import pdb; pdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pub line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

parser.add_argument('--log_dir',type=str, default='/tmp/tensorflow/mnist/logs/mnist_with_summaries',
help='Summaries log directory')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the command line to run this as a comment in the end of the file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an example run command for distributed runs

@@ -39,6 +39,10 @@

#include "ngraph/runtime/interpreter/int_backend.hpp"

#ifdef NGRAPH_DISTRIBUTED
#include <mpi.h>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>. Is this a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@@ -29,6 +29,10 @@

#include <iomanip>

#ifdef NGRAPH_DISTRIBUTED
#include <mpi.h>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>> is this a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for catching this!

help='Summaries log directory')
FLAGS, unparsed = parser.parse_known_args()
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
# run comand for this distributed script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# run comand for this distributed script
# run command for this distributed script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Fixed it.

@sayantan-nervana sayantan-nervana added the Fully Reviewed All reviewers have approved label Jan 4, 2019
build_ngtf.py Outdated
@@ -512,6 +512,7 @@ def main():
"-DNGRAPH_TUNE_ARCH=" + target_arch,
"-DNGRAPH_ARTIFACTS_DIR=" + artifacts_location,
"-DUNIT_TEST_ENABLE=ON",
"-DNGRAPH_DISTRIBUTED_ENABLE=TRUE",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should be false i.e., for regular builds no need to enable distributed. Need to add a command line option to the build_ngtf.py (e.g., --enable_distributed) which when specified will add this flag. See the --debug_build as an example of how to extend the cmake options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the option of --distributed_build for distributed build

See extensive documentation at
https://www.tensorflow.org/get_started/mnist/beginners
"""
from __future__ import absolute_import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a summary of modifications made to this file and a reference to the source of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listed the changes I made from the source file and added the link to the reference source of this file

@@ -36,6 +36,10 @@

#include "ngraph/runtime/interpreter/int_backend.hpp"

#ifdef NGRAPH_DISTRIBUTED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use the if defined(NGRAPH_DISTRIBUTED)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to #if defined NGRAPH_DISTRIBUTED

@@ -102,13 +106,33 @@ class NGraphRewritePass : public GraphOptimizationPass {
static std::string GraphFilenamePrefix(std::string kind, int idx) {
std::stringstream ss;
ss << kind << "_" << std::setfill('0') << std::setw(4) << idx;
#ifdef NGRAPH_DISTRIBUTED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a function that can be used instead of the code repetition below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the distributed class defined in nGraph core

CMakeLists.txt Outdated
include_directories(SYSTEM ${MPI_C_INCLUDE_PATH} ${MPI_CXX_INCLUDE_PATH})
link_directories(${MPI_C_LIBRARIES} ${MPI_CXX_LIBRARIES})
endif()
#if(NGRAPH_DISTRIBUTED_ENABLE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this temporary? If not please remove the commented out code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to include the MPI build

@jianyinglang
Copy link
Contributor Author

TESTNOW

@avijit-nervana avijit-nervana merged commit adcd84f into master Jan 18, 2019
@avijit-nervana avijit-nervana deleted the langjian/distributed_json branch January 18, 2019 16:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Fully Reviewed All reviewers have approved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants