Skip to content

Commit

Permalink
example
Browse files Browse the repository at this point in the history
  • Loading branch information
KevinyWu committed Mar 23, 2024
1 parent 2f36744 commit 0481690
Show file tree
Hide file tree
Showing 54 changed files with 2,042 additions and 0 deletions.
163 changes: 163 additions & 0 deletions example/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

dump.rdb
MPCS-FaaS-Tests/
Binary file added example/Project.pdf
Binary file not shown.
35 changes: 35 additions & 0 deletions example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Function as a Service (FaaS) Platform

Team: Kevin Wu and Francisco Mendes

## Running the FaaS Platform

1. Navigate to `src/`
2. Terminal 1 - start Redis: `redis-server`
3. Terminal 2 - start the API: `uvicorn main:app --reload`
4. Terminal 3 - start the task dispatcher
1. Local: `python3 task_dispatcher.py -m local -p 8888 -w 2`
2. Pull: `python3 task_dispatcher.py -m pull -p 8888`
3. Push: `python3 task_dispatcher.py -m push -p 8888`
5. Terminal 4 - start the workers (for push/pull only)
1. Pull: `python3 pull_worker.py 2 tcp://127.0.0.1:8888`
2. Push: `python3 push_worker.py 2 tcp://127.0.0.1:8888`
6. Terminal 5 - run the client: `python3 client.py -p 8000`

## Running Pytests (on Mac)

1. `chmod +x run_tests.sh`
2. Run tests
1. Local worker: `./run_tests.sh local`
2. Pull worker: `./run_tests.sh pull`
3. Push worker: `./run_tests.sh push`
3. The tests will run in one of the new terminal windows. Check the output for the results.

## Running Performance Tests (on Mac)

1. `chmod +x run_performance.sh`
2. Run tests (`w` is the number of workers)
1. Local worker: `./run_performance.sh local w`
2. Pull worker: `./run_performance.sh pull w`
3. Push worker: `./run_performance.sh push w`
3. Results are saved in `src/results/` and plots are made by `src/tests/performance.ipynb` and saved in `figures/`
Binary file added example/figures/average_time_plot_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/average_time_plot_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/final-latency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/pull1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/pull2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/push1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/figures/push2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
115 changes: 115 additions & 0 deletions example/latency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import time
import sys
from src.tests.serialize import serialize, deserialize
import requests
import subprocess
from matplotlib import pyplot as plt

# Modify these parameters based on your experiment
BASE_URL = "http://127.0.0.1:8000/" # Replace with the actual URL of your MPCSFaaS service
# NUM_WORKERS = [1, 2, 4,8] # Vary the number of workers

def launch_task_dispatcher(mode,port=4321,num_workers=1):
# Launch the task dispatcher
if mode == "local":
# print(f"Launched task dispatcher. Mode: {mode}, Port: {port}, Number of Workers: {num_workers}")
return subprocess.Popen(["python3", "src/task_dispatcher.py", "-m", mode, "-w", str(num_workers)],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
else:
# print(f"Launched task dispatcher. Mode: {mode}, Port: {port}, Number of Workers: {num_workers}")
return subprocess.Popen(["python3", "src/task_dispatcher.py", "-m", mode, "-p", str(port)],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)

def launch_worker(mode,port=4321,num_workers=1):
url = f"tcp://127.0.0.1:{port}"
if mode == "local":
print(f"Local mode.")
return None
elif mode == "pull":
# print(f"Launched worker. Mode: {mode}, URL: {url}, Number of Workers: {num_workers}")
return subprocess.Popen(["python3", "src/pull_worker.py", str(num_workers), url],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
elif mode == "push":
# print(f"Launched worker. Mode: {mode}, URL: {url}, Number of Workers: {num_workers}")
return subprocess.Popen(["python3", "src/push_worker.py", str(num_workers), url],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)

def noOp():
return

def register_function(name, func):
register_function = {
'name': name,
'payload': serialize(func)
}
response = requests.post(f'{BASE_URL}register_function', json=register_function)
assert response.status_code == 201
assert 'function_id' in response.json()
func_id = response.json()['function_id']
return func_id

def run_test(func_id, num_tasks):

execute_function= {
'function_id': func_id,
'payload': serialize(((), {}))
}
tasks = []
start_time = time.time()
for i in range(num_tasks):
response = requests.post(f'{BASE_URL}execute_function', json=execute_function)
# assert response.status_code == 201
# assert 'task_id' in response.json()
task_id = response.json()['task_id']
# print('Task ID:', task_id)
tasks.append(task_id)

# Retrieve the initial task status
num_done = 0
while num_done <= num_tasks:
for task_id in tasks:
response = requests.get(f'{BASE_URL}status/{task_id}')
status = response.json()['status']
assert response.status_code == 200
assert response.json()['task_id'] == task_id
if status == "COMPLETED":
num_done += 1

end_time = time.time()
# print(f"Ran {func_id} succesfully {num_tasks} times.")
# print(f"Elapsed time: {time.time() - start_time}")
return end_time - start_time

if __name__ == "__main__":
noOp_id = register_function("noOp", noOp)

times = {}
times["local"] = []
times["push"] = []
times["pull"] = []

import random
for mode in ['local','pull','push']:
port = random.randint(1000,9999)
task_p = launch_task_dispatcher(mode,num_workers=1,port=port)
worker_p = None
if mode == "pull" or mode == "push":
worker_p = launch_worker(mode,num_workers=1,port=port)
for i in range(10):
t = run_test(noOp_id, 10)
times[mode].append(t)
if mode == "pull" or mode == "push":
worker_p.kill()
task_p.kill()

avg_times = {}
avg_times["local"] = sum(times["local"])/len(times["local"])
avg_times["pull"] = sum(times["pull"])/len(times["pull"])
avg_times["push"] = sum(times["push"])/len(times["push"])

plt.bar(['Local','Pull','Push'],[avg_times["local"],avg_times["pull"],avg_times["push"]])
plt.title("Latency by Mode")
plt.ylabel("Latency (s)")
plt.savefig("latency.png")
print(f"Local latency: {avg_times['local']}")
print(f"Pull latency: {avg_times['pull']}")
print(f"Push latency: {avg_times['push']}")



81 changes: 81 additions & 0 deletions example/performance_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@

# Performance Report

The performance test of our system focused on two main components: latency and throughput. For each component, one specific function was registered and tested.


## Methodology

**Latency**

In the src/ folder, launch the redis server ('redis-server') and the FaaS service ('uvicorn main:app --reload').

In a separate terminal, run the latency test ('python3 latency.py'). The latency test will run 10 times and output the average time of each mode to return, as well as generate a bar plot of the latency in ('latency.png'). The latency test was performed by registering a no operation function (noOp) to the Redis server, which should terminate immediately after being sent to a worker. After registering this function, we sent 1 request to the server to execute this function with a specific worker mode (worker count 1) and measured the amount of time it took for the server to respond. This process was repeated 100 times and the average latency was calculated.

**Throughput**

We employed a weak-scaling test to assess the throughput of the service. For each mode and each worker count (1,2,4,8), the test will place a constant number of requests per worker to be executed; the total number of tests increased linearly with worker count. The function executed accepted an integer argument, doubled it, slept for 3 seconds, then returned the result. For each task, we gave a unique argument to ensure all unique tasks were being completed. We measured the total time it took for all tasks to be completed once they were sent by the client.

To run, launch './run_performance.sh [mode] [num workers]'. Results will be saved in 'src/results/[mode]_[num_workers].csv'. This script will compute results for a linear scale rate of 1 task per worker and 3 tasks per worker. 5 trials are run for each. This script was run for all 3 modes and 1,3,5,7 workers. After each runtime, all launched terminal windows must be closed to close the service before running again.

To generate graphs with compute results, run results/performance.ipynb

## Results
**Latency**

![Image](./figures/final-latency.png)

*Plot of latency vs. Mode*

As demonstrated in the bar plot, the latecy of the local mode was the lowest of all the worker modes. The latency of the pull and push mode were comparable, with the pull mode being slightly faster than the push mode.

**Throughput**

Runtimes can be found in results folder under [mode]_[num workers].csv. The left column indicates runtimes for 1 task per worker, and the right column indicates runtimes for 3 tasks per worker.


![Image](./figures/average_time_plot_1.png)

*Plot of runtime vs. Worker Count for each Mode, 1 task per Worker*

![Image](./figures/average_time_plot_3.png)

*Plot of runtime vs. Worker Count for each Mode, 3 task per Worker*

As demonstrated in the plot above, the throughput of the service differed significantly with both the worker count and the mode of worker.

Both the push and pull modes were slower than the local mode, with push mode being the slowest by far of all modes. At all worker counts, this hierarchy of local as fastest and push as slowest was observed.

The throughput decreased (completion time increased) as the number of workers increased for all modes. This decrease was strongest for both remote worker modes, and was weak for the local mode.

For the greater number of tasks per worker (3), we also note that the increase to runtime relative to increase worker count was greater (slope of line steeper). This held for all modes.


## Discussion
**Latency**

We believe the pull mode had lower latency than the push mode due to the blocking nature of the ZMQ REQ/REP pattern. The dispatcher, after sending the task to the worker, would be blocked until the worker returned its result. Since the latency test was conducted by only one worker, there would be no other worker potentially blocking the dispatcher from receiving the result from the correct pull worker. If the latency test was conducted with additional workers, it is likely latency would increase as workers would poll the dispatcher for tasks and block the working worker from returning its result.

The local mode had the lowest latency as no ZMQ connections were made. Instead, tasks were processed and returned to the Redis server locally, reducing overhead.

The push mode's greater latency was likely a result of the manner in which the dispatcher handled tasks being sent. In the push mode, the dispatcher awaits a message until a specified timeout, then attempts to fetch a task from the queue.

Several approaches could be taken in a future implementation to reduce latency of the pull and push worker modes. As mentioned in the technical report, the ZMQ connection and its limitations served as a limitation in our implementation of each modes. For the pull mode, we required workers to poll the dispatcher for tasks. As the ZMQ REQ/REP pattern is blocking, having multiple workers polling the dispatcher would prevent a worker with a finished result from delivering its result. This may account for latency in the system.

**Throughput**

The local mode, as the baseline model, was once again the fastest of all modes. This was expected, as the lack of ZMQ connections greatly reduced overhead. In addition, there was no blocking mechanism in the local mode: tasks would be dispatched to the pool as soon as they were received, and there were no additional heartbeat or registration messages that would block the processing and returning of tasks.

The throughput of the service decreased (runtime increased) with worker count for all modes. Since we tested throughput through weak-scaling, the number of total tasks increased linearly with the number of workers and it was expected that each worker would still complete the same number of tasks despite worker count. A greater worker count (and thus greater number of tasks) increased the overhead involved in placing tasks in queue and dispatching them to available workers.

The throughput decreased at a quicker rate relative to worker count when the number of tasks per worker was greater. This is likely due to more total tasks being assigned for the workers to complete, thus increasing the total overhead since more tasks had to be fetched and dispatched.

For both remote modes, a greater number of workers led to more heartbeats being sent. The dispatcher had to manage more of these messages, increasing overhead since receiving these messages would block others from being received.

For the pull mode, a greater number of workers led to more pull workers polling the dispatcher for tasks.

For the push mode, the process of finding an available worker would take longer as the number of workers increased since we cycled through the worker registration to find a free worker. This would partially contribute to the greater overhead in the push mode.




Loading

0 comments on commit 0481690

Please sign in to comment.