Skip to content

Howto setup an IPython cluster

Valentin edited this page Sep 9, 2017 · 9 revisions

This is a wiki page for the PyBroMo software

This is a quick howto on the setup of an IPython cluster. For more info see the official IPython docs: Using IPython for parallel computing.

Before starting you need to install IPython. The easiest way is to get it through a scientific python distribution, like Anaconda.

Parallel computing on a single machine

Method 1

Launch the notebook server and, from the cluster tab, start 4 engines.

Method 2

Open a terminal (cmd.exe) and type:

ipcluster start -n 4

Parallel computing on many machines (Windows 7)

Reference from IPython docs:

Here we configure 2 machines, one controller-host that launch the simulation and one slave-host that performs the computation. This procedure can be extended to multiple "slave" machine just repeating this same configuration.

NOTE for Windows: All the commands must be pasted in a cmd.exe terminal.

Setup the controller

Only the first time we need to create an IPython profile.

ipython profile create --parallel --profile=parallel

This command copies a new set of configuration files in IPYTHONDIR/profile_parallel, where IPYTHONDIR is usually a folder named .ipython in the user home folder (C:\Users\username\). These files can be customized to change the default behavior, if needed.

Now, each time we want to start a parallel computation we begin starting the controller:

ipcontroller --profile=parallel --ip=169.232.130.141

where the address is the controller ip address.

This command creates a file ipcontroller-engine.json that contains the connection info that the other machines need in order to connect to the controller. The file is located in IPYTHONDIR/profile_parallel/security.

We need to copy ipcontroller-engine.json to the computation machine. To automate this step I like to link the IPython folder into a Dropbox folder so that all the configuration files are automatically copied/updated on the different machines.

Setup the "slave" machine

Also on the machine in which we run the computation it's useful to create a profile (only the first time), with the same command as before:

ipython profile create --parallel --profile=parallel

A new set of configuration files is created in IPYTHONDIR/profile_parallel.

We can start a computation engine with the ipengine command, specifying the path of the ipcontroller-engine.json file:

ipengine --profile=parallel --file=C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json

or, we can write the file name in the configuration file so we don't need to write it every time. To do so, edit the file ipengine_config.py found in the previously created profile folder (IPYTHONDIR/profile_parallel). Find the line:

#c.IPEngineApp.url_file = u''

remove the trailing # and write the ipcontroller-engine.json path, in our example:

c.IPEngineApp.url_file = u'C:\Data\user\software\Dropbox\ipython\profile_parallel\security\ipcontroller-engine.json'

Now to launch an engine simply type:

ipengine --profile=parallel

It is suggested to launch as many engine as the number of cores. To launch a second engine open a new terminal and type again the command, and so on.

To add another machine for computation just repeat the previous steps.

Launching the simulation

Once the cluster is started (either in a single machine or on multiple machines) we are ready to launch a simulation.

On the controller machine start an IPython QtConsole or an IPython notebook using the profile parallel:

ipython qtconsole --profile=parallel

or

ipython notebook --profile=parallel

Then do:

from IPython.parallel import Client
rc = Client()
rc.ids

the last command should print the number of engines that were started.

Alternatively, if you have a QtConsole or Notebook already started without the profile parallel, you can simply specify the path of the file that contains the clients (engines) information. This file is ipcontroller-client.json (not -engines as before!) and is located in the profile folder.

NOTE: This trick is used by the PyBroMo notebooks so you don't need to restart the notebook server after you launch the cluster.