Skip to content

DockerMPI

robnagler edited this page Jun 2, 2016 · 2 revisions

Running MPI on Docker

radiasoft/salt-conf setups up a network of nodes running Salt minions. It's complicated stuff, and evolving. However, it works, and this document explains how it works.

Objective

Our goal is to create an MPI cluster from user-specified docker images. Those users are logging in via GitHub and setting up their jobs via JupyterHub. The container started by JupyterHub should be identical to that which is started by our MPI cluster.

  • JupyterHub users do not have Unix user ids
  • Containers must be started as a non-root guest user (--user)
  • JupyterHub user's data directory is mounted in container
  • Data is shared via NFS or other cluster file system
  • sshd runs in container for MPI as guest user
  • Results have to be reported back to GitHub user
  • MPI network must be visible (--net=host)
  • Queue manager must be compatible with the above
  • JupyterHub user is not allowed to interact with Docker directly
  • Docker images are user selectable (from an approved list)
  • Jupyter and MPI containers started from same image

Research

None of the standard queue managers (SLURM, SGE, Torque, etc.) support Docker. MPI is difficult to configure. It requires keys already be distributed or a no password login.

NERSC Shifter

Contain This, Unleashing Docker for HPC describes an interesting system, but it requires the end user to have Docker access, e.g. "the user logs into the Shifter-enabled computational resource and issues a command like docker pull X where X represents the tagged container revision.`

Once you give end-users Docker access, they have root access to your computers. This is unacceptable.

Technical Details

MPI fails badly

There are all kinds of ways MPI fails. For example, if a single process dies hard, e.g. SEGV (common in C++ code), then the entire MPI job can hang waiting for a response. There don't seem to be retries on ssh connections.

MPI opens many ports so in general, you need to turn off the firewall. The cluster has to be on a VPC.

Clone this wiki locally