dockerComp

GOAL

To setup a basic prototype for distributed computing in docker. If time permits, add a complex computing task.

INTRODUCTION

For the purpose of Distributed (Scientific) Computing, scientists across the world have been mostly using pre-configured VM images to let the client volunteer in contributing towards micro-processing tasks that involve processing of raw data received in chunks over the network.

But, since the introduction of Docker, life has changed and so have the performance benchmarks. We propose an system that uses the benefits of Docker to hopefully perform far better than the currently achieved milestones through VMs. The VMs have a huge overhead of starting up, as compared to Docker containers. Moreover, we don't even need to explain the difference between running more than one VM on a HostOS compared to running multiple docker containers on that same machine! See the point? :)

Youtube Video Explaining this project:
Click here for Screencast for running just one script on client side.

INSTALLATION

Server side (src/server/):
- Make sure that your 'src/server/' is up and running, either locally (for test purpose), or if its deployed elsewhere, then the hostname/IP and Port is provided in the environment variables as under $DC_HOST and $DC_PORT.
  
  (refer next major point on 'Client side' for this script)
- Do ensure that for running the server, you need to install mongoDB. Refer to following: install_mongo guide and then set the env variables U_DB, U_USER and U_PASS giving the same values to them as the db name, it's user and password set while setting up mongoDB.
- Ensure that you've installed deps from dockerComp/src/server/requirements.txt
- To run the src/server, open up a terminal, go to dockerComp/src/server/ and run $ ./start This starts the server locally on your machine.
Client side (src/client/):
- Note: For server side deployement (i.e., the server that basically is responsible for distributing data to clients), It has to be deployed somewhere and it's IP has to be provided in your configuration file. And then you may distribute the script installer.sh alongwith the configuration to the clients.
- Download This Script and run
$ ./installer.sh [configure your Server location for this script, as under $DC_HOST & $DC_PORT ]
- Once installed, the daemon output would lie in $HOME/dockerComp/src/client/scripts/nohup.out and the daemon itself, would like in $HOME/dockerComp/src/client/scripts/slave_manager. To kill the daemon, you need to run $ kill -9 $(ps -e | grep slave_manager | awk -F' ' '{print $1}')

Should you need to remove all traces of dockerComp from your machine, just run the script 'cleanup` included in the source code of this project root.

Cheers! :)

NOTES

Demo link to be updated soon.
In case you're curious how to go about running this from client side:
- So once the server is up and running, all one has to do is download and run installer.sh
Docker Image: $ docker pull arcolife/docker_comp (will be kept updated)

FAQ

Refer to Wiki .. click Here!.

References:

 - http://www.rightscale.com/blog/sites/default/files/docker-containers-vms.png 
 - http://en.wikipedia.org/wiki/Docker_%28software%29#cite_ref-3

So, just to give you a context of this whole project, take a look at this project called CernVM. This is a really awesome project, developed to help collect CERN's LHC data and perform data analysis on a volunteer's computer or even on commercial clouds. Just imagine if the whole process of using VM was dockerized!

FEATURES

Can be used for: - Image Processing - General Data Analysis - Scientific Computing - CrowdSourcing projects.

FUTURE GOALS

Make this a pluggable dockerized distributed computing tool, where you just have to include a compution task (say, map-reduce) and make it send data to clients. The app should be able to handle the rest.
Benchmark results and compare with existing methodologies.

TESTS

From client side:
- although the default connection establishment test is included with install scripts; run $ src/client/scripts/test_server_conn (make sure env vars DC_HOST and DC_PORT are set)
From server side:
- TBD
Workloads:
- Currently a simple task. TBD.

WORKFLOW

Server
- Dashboard to Manage:
  - No. of Clients (and # of containers per client)
  - Resources allocated to the containers
- Master app that manages data sent to each client and checks for integrity.
Client
- Installation of Docker
- Starting Containers
- Installation of Application inside the Container
- Connection Establishment with the Server.
- Scripts for the computation
- Error Reporting

REFERENCES

https://github.com/cernvm
http://en.wikipedia.org/wiki/List_of_distributed_computing_projects
http://www.rightscale.com/blog/sites/default/files/docker-containers-vms.png
http://www.psc.edu/science/
http://pybossa.com/
https://okfn.org/press/releases/crowdcrafting-putting-citizens-control-citizen-science/
http://www.mediaagility.com/2014/docker-the-next-big-thing-on-cloud/
http://cernvm.cern.ch/portal/
http://www.nature.com/news/software-simplified-1.22059

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dockerComp

Files

README.md

Latest commit

History

README.md

File metadata and controls

dockerComp