Skip to content

Code repo for Pytheas (formally DDN), a control platform for enabling data-driven control for network applications

Notifications You must be signed in to change notification settings

nsdi2017-ddn/ddn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This directory archives Pytheas implementation prior to NSDI submission.

The newest version of Pytheas code is available at https://github.com/nsdi2017-ddn/pytheas

Table of Contents

  1. Environment
  2. Front Server
    1. Web Server
    2. Group Manager
  3. kafka
  4. spark
    1. Decision Maker
    2. Communicator
  5. Benchmark
    1. Response Time
      1. python Script
      2. Apache Benchmark
    2. Python Benchmark
      1. Standalone Benchmark
      2. Distributed Benchmark
    3. Kafka Benchmark
  6. Trace
    1. Algorithm Comparison
    2. Fault Tolerance
    3. One Host Experiment
  7. Abandoned

##Environment

System: Ubuntu 15.10

Java compiler tools (Maven) installation:

$ sudo apt-get update
$ sudo apt-get install -y default-jdk maven

##Front Server

Contains programs need to be deployed on each front-end server host.

###Web Server

Auto-deployment script (for Apache httpd and php programs):

../front_server $ sudo ./frontserver_deploy.sh

###Group Manager

compile using maven:

../GroupManager $ mvn package

run:

../GroupManager $ java -cp target/GroupManager-1.0-SNAPSHOT.jar frontend.GroupManager <cluster_ID> <kafka_server> <config_file>
<cluster_ID> is the ID of current cluster

<kafka_server> is the list of IP of kafka servers, separated by comma

<config_file> contains labels of update info and reduced labels

##Kafka

Deploy on one or more hosts in each cluster to manage the communications between each functional module.

Kafka deployment:

../kafka $ sudo ./kafka_deploy.sh <host_list> <host_number>
<host_list> is all IP addresses of kafka servers, separated by comma

<host_number> is the sequence number of current host in host_list

run:

$ cd /usr/share/kafka
$ sudo bin/zookeeper-server-start.sh config/zookeeper.properties &
$ sudo bin/kafka-server-start.sh config/server.properties

Note: If run kafka on more than one host. Execute third command only if second command has been executed on each host.


##Spark

Contains Decision-making module and communication module, each uses spark and can be run on one or more hosts.

Spark deployment:

../spark $ sudo ./spark_deploy.sh

###Decision Maker

make decision for each group.

compile using maven and submit it to spark.

###Communicator

communicate with backend cluster and other frontend clusters.

like DecisionMaker, compile using maven and submit it to spark.

reference :

Run Spark on Multi-hosts

Spark Submitting Applications


##Benchmark

some small scripts and programs to test the scalability of frontend cluster.

###Response Time

Test the response time of requests.

Python Script

A simple python program to perform HTTP POST request 1000 times and plot the CDF of response time:

$ ./post_time.py

Apache Benchmark

A shell using Apache Benchmark to test the response time of frontend server.

$ ./responseTime.ssh

###Python Benchmark

Standalone Benchmark

A standalone benchmark to perform the HTTP POST request. Test time and request per second(RPS) can be controlled.

$ ./benchmark.py

Distributed Benchmark

A distributed benchmark to perform the HTTP POST request.

Run slave program on all the hosts to perform the benchmark. Then run master program on one host to start test. When test finished, master program will generator three figures(Response Time, Successful RPS, CDF Response Time)

run slave:

$ ./dbenchmark_slave <url>
<url>: Desti-URL slave program will send requests to

run master:

$ ./dbenchmark_master <Time> <RPS>
<Time>: the time this test will last

<RPS>: request per second. Actually this parameter is only positive correlated with real RPS. The real RPS will show in the result figure.

Note: the host runs master program need to install matplotlib :

sudo apt-get install -y python-matplotlib

###Kafka Benchmark

This is special designed for test of throughput of Kafka and Spark Streaming. Need cooperation of special msg format.

compile:

../KafkaBenchmark $ mvn package

run:

send msg to kafka :

java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_sender> <mps>
<kafka_server>: hostname of kafka server

mps: messages per second

Note: By default all msgs are sent to topic internal_groups

receive msg from kafka :

java -cp target/KafkaBenchmark-1.0-SNAPSHOT.jar mybenchmark.MsgReader <kafka_server> <topic>
<kafka_server>: hostname of kafka server

<topic>: Kafka topic this Reader will comsume

##Trace

some scripts to test the system or algorithm performance using traces.

trace_sort.sh : sort the trace by timestamp

###Algorithm Comparison

main scripts for algorithm comparison

auto_plot.sh : plot the algorithm comparison results

combine.py : process raw data

cost.conf : Gnuplot script for the plot

pull*.sh : pull the test result from cluster to localhost

trace_parser.py : parse the trace and simulate the player

###Fault Tolerance

main scripts for fault tolerance experiment

ft.conf : Gnuplot script

sort : process raw data

trace_parser_multi.py : parse the trace and simulate multiple players

###One Host Experiment

For real-world trace benchmark, deploy all mudules of a frontend cluster on one host. This is more efficient for multiple algorithms comparison.

autoscp.sh : upload files used.

onehost_deploy : deployment script

start_tmux : run necessary programs in tmux


##Abandoned

abandoned module codes, including load balancer (HAProxy) and proxy server.

About

Code repo for Pytheas (formally DDN), a control platform for enabling data-driven control for network applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published