-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify distributed key-value store usage #450
Open
sethah
wants to merge
2
commits into
zackchase:master
Choose a base branch
from
sethah:dist_kv
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -167,8 +167,28 @@ | |||||
"\n", | ||||||
"Not only can we synchronize data within a machine, with the key-value store we can facilitate inter-machine communication. To use it, one can create a distributed kvstore by using the following command: (Note: distributed key-value store requires `MXNet` to be compiled with the flag `USE_DIST_KVSTORE=1`, e.g. `make USE_DIST_KVSTORE=1`.)\n", | ||||||
"\n", | ||||||
"In the distributed setting, `MXNet` launches three kinds of processes (each time, running `python myprog.py` will create a process). One is a *worker*, which runs the user program, such as the code in the previous section. The other two are the *server*, which maintains the data pushed into the store, and the *scheduler*, which monitors the aliveness of each node.\n", | ||||||
"\n", | ||||||
"To use the distributed key-value store, we must first start a scheduler process and at least one server process. When the MXNet library is imported in a process, it checks what the process's role is through the `DMLC_ROLE` environment variable. Starting a server or scheduler is as simple as importing MXNet with the appropriate environment variables set.\n", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
"\n", | ||||||
"```python\n", | ||||||
"# start scheduler\n", | ||||||
"scheduler_env = os.environ.copy()\n", | ||||||
"scheduler_env.update({\"DMLC_ROLE\": \"scheduler\",\"DMLC_PS_ROOT_PORT\": \"9090\",\"DMLC_PS_ROOT_URI\": \"<scheduler-ip>\",\"DMLC_NUM_SERVER\": \"1\",\"DMLC_NUM_WORKER\": \"2\",\"PS_VERBOSE\": \"2\"})\n", | ||||||
"subprocess.Popen('python -c \"import mxnet\"', shell=True, env=scheduler_env)\n", | ||||||
"\n", | ||||||
"# start server\n", | ||||||
"server_env = os.environ.copy()\n", | ||||||
"server_env.update({\"DMLC_ROLE\": \"server\",\"DMLC_PS_ROOT_PORT\": \"9090\",\"DMLC_PS_ROOT_URI\": \"<scheduler-ip>\",\"DMLC_NUM_SERVER\": \"1\",\"DMLC_NUM_WORKER\": \"2\",\"PS_VERBOSE\": \"2\"})\n", | ||||||
"subprocess.Popen('python -c \"import mxnet\"', shell=True, env=server_env)\n", | ||||||
"```\n", | ||||||
"\n", | ||||||
"To use a distributed key-value store from a worker process, just create the store as follows:\n", | ||||||
"\n", | ||||||
"```python\n", | ||||||
"store = kv.create('dist')\n", | ||||||
"# setting the optimizer instructs the server on how to update weights that are pushed to it\n", | ||||||
"store.set_optimizer(mxnet.optimizer.SGD())\n", | ||||||
"```\n", | ||||||
"\n", | ||||||
"Now if we run the code from the previous section on two machines at the same time, then the store will aggregate the two ndarrays pushed from each machine, and after that, the pulled results will be: \n", | ||||||
|
@@ -178,9 +198,7 @@ | |||||
" [ 6. 6. 6.]]\n", | ||||||
"```\n", | ||||||
"\n", | ||||||
"In the distributed setting, `MXNet` launches three kinds of processes (each time, running `python myprog.py` will create a process). One is a *worker*, which runs the user program, such as the code in the previous section. The other two are the *server*, which maintains the data pushed into the store, and the *scheduler*, which monitors the aliveness of each node.\n", | ||||||
"\n", | ||||||
"It's up to users which machines to run these processes on. But to simplify the process placement and launching, MXNet provides a tool located at [tools/launch.py](https://github.com/dmlc/mxnet/blob/master/tools/launch.py). \n", | ||||||
"It's up to users which machines to run the worker, scheduler, and server processes on. But to simplify the process placement and launching, MXNet provides a tool located at [tools/launch.py](https://github.com/dmlc/mxnet/blob/master/tools/launch.py). \n", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
"\n", | ||||||
"Assume there are two machines, A and B. They are ssh-able, and their IPs are saved in a file named `hostfile`. Then we can start one worker in each machine through: \n", | ||||||
"\n", | ||||||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.