example

adinilfeld · Mar 23, 2024 · 0481690 · 0481690
1 parent 2f36744
commit 0481690
Show file tree

Hide file tree

Showing 54 changed files with 2,042 additions and 0 deletions.
diff --git a/example/.gitignore b/example/.gitignore
@@ -0,0 +1,163 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+
+dump.rdb
+MPCS-FaaS-Tests/
diff --git a/example/Project.pdf b/example/Project.pdf
diff --git a/example/README.md b/example/README.md
@@ -0,0 +1,35 @@
+# Function as a Service (FaaS) Platform
+
+Team: Kevin Wu and Francisco Mendes
+
+## Running the FaaS Platform
+
+1. Navigate to `src/`
+2. Terminal 1 - start Redis: `redis-server`
+3. Terminal 2 - start the API: `uvicorn main:app --reload`
+4. Terminal 3 - start the task dispatcher
+   1. Local: `python3 task_dispatcher.py -m local -p 8888 -w 2`
+   2. Pull: `python3 task_dispatcher.py -m pull -p 8888`
+   3. Push: `python3 task_dispatcher.py -m push -p 8888`
+5. Terminal 4 - start the workers (for push/pull only)
+   1. Pull: `python3 pull_worker.py 2 tcp://127.0.0.1:8888`
+   2. Push: `python3 push_worker.py 2 tcp://127.0.0.1:8888`
+6. Terminal 5 - run the client: `python3 client.py -p 8000`
+
+## Running Pytests (on Mac)
+
+1. `chmod +x run_tests.sh`
+2. Run tests
+   1. Local worker: `./run_tests.sh local`
+   2. Pull worker: `./run_tests.sh pull`
+   3. Push worker: `./run_tests.sh push`
+3. The tests will run in one of the new terminal windows. Check the output for the results.
+
+## Running Performance Tests (on Mac)
+
+1. `chmod +x run_performance.sh`
+2. Run tests (`w` is the number of workers)
+   1. Local worker: `./run_performance.sh local w`
+   2. Pull worker: `./run_performance.sh pull w`
+   3. Push worker: `./run_performance.sh push w`
+3. Results are saved in `src/results/` and plots are made by `src/tests/performance.ipynb` and saved in `figures/`
diff --git a/example/figures/average_time_plot_1.png b/example/figures/average_time_plot_1.png
diff --git a/example/figures/average_time_plot_3.png b/example/figures/average_time_plot_3.png
diff --git a/example/figures/final-latency.png b/example/figures/final-latency.png
diff --git a/example/figures/pull1.png b/example/figures/pull1.png
diff --git a/example/figures/pull2.png b/example/figures/pull2.png
diff --git a/example/figures/push1.png b/example/figures/push1.png
diff --git a/example/figures/push2.png b/example/figures/push2.png
diff --git a/example/latency.py b/example/latency.py
@@ -0,0 +1,115 @@
+import time
+import sys
+from src.tests.serialize import serialize, deserialize
+import requests
+import subprocess
+from matplotlib import pyplot as plt
+
+# Modify these parameters based on your experiment
+BASE_URL = "http://127.0.0.1:8000/"  # Replace with the actual URL of your MPCSFaaS service
+# NUM_WORKERS = [1, 2, 4,8]  # Vary the number of workers
+
+def launch_task_dispatcher(mode,port=4321,num_workers=1):
+    # Launch the task dispatcher
+    if mode == "local":
+        # print(f"Launched task dispatcher. Mode: {mode}, Port: {port}, Number of Workers: {num_workers}")
+        return subprocess.Popen(["python3", "src/task_dispatcher.py", "-m", mode, "-w", str(num_workers)],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
+    else:
+        # print(f"Launched task dispatcher. Mode: {mode}, Port: {port}, Number of Workers: {num_workers}")
+        return subprocess.Popen(["python3", "src/task_dispatcher.py", "-m", mode, "-p", str(port)],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
+
+def launch_worker(mode,port=4321,num_workers=1):
+    url = f"tcp://127.0.0.1:{port}"
+    if mode == "local":
+        print(f"Local mode.")
+        return None
+    elif mode == "pull":
+        # print(f"Launched worker. Mode: {mode}, URL: {url}, Number of Workers: {num_workers}")
+        return subprocess.Popen(["python3", "src/pull_worker.py",  str(num_workers), url],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
+    elif mode == "push":
+        # print(f"Launched worker. Mode: {mode}, URL: {url}, Number of Workers: {num_workers}")
+        return subprocess.Popen(["python3", "src/push_worker.py",  str(num_workers), url],stdout=subprocess.PIPE,stdin=subprocess.PIPE,close_fds=True)
+
+def noOp():
+    return
+
+def register_function(name, func):
+    register_function = {
+        'name': name,
+        'payload': serialize(func)
+    }
+    response = requests.post(f'{BASE_URL}register_function', json=register_function)
+    assert response.status_code == 201
+    assert 'function_id' in response.json()
+    func_id = response.json()['function_id']
+    return func_id
+
+def run_test(func_id, num_tasks):
+
+    execute_function= {
+        'function_id': func_id,
+        'payload': serialize(((), {}))
+    }
+    tasks = []
+    start_time = time.time()
+    for i in range(num_tasks):
+        response = requests.post(f'{BASE_URL}execute_function', json=execute_function)
+        # assert response.status_code == 201
+        # assert 'task_id' in response.json()
+        task_id = response.json()['task_id']
+        # print('Task ID:', task_id)
+        tasks.append(task_id)
+
+    # Retrieve the initial task status
+    num_done = 0
+    while num_done <= num_tasks:
+        for task_id in tasks:
+            response = requests.get(f'{BASE_URL}status/{task_id}')
+            status = response.json()['status']
+            assert response.status_code == 200
+            assert response.json()['task_id'] == task_id
+            if status == "COMPLETED":
+                num_done += 1
+
+    end_time = time.time()
+    # print(f"Ran {func_id} succesfully {num_tasks} times.")
+    # print(f"Elapsed time: {time.time() - start_time}")
+    return end_time - start_time
+
+if __name__ == "__main__":
+    noOp_id = register_function("noOp", noOp)
+
+    times = {}
+    times["local"] = []
+    times["push"] = []
+    times["pull"] = []
+
+    import random
+    for mode in ['local','pull','push']:
+        port = random.randint(1000,9999)
+        task_p = launch_task_dispatcher(mode,num_workers=1,port=port)
+        worker_p = None
+        if mode == "pull" or mode == "push":
+            worker_p = launch_worker(mode,num_workers=1,port=port)
+        for i in range(10):
+            t = run_test(noOp_id, 10)
+            times[mode].append(t)
+        if mode == "pull" or mode == "push":
+            worker_p.kill()
+        task_p.kill()
+
+    avg_times = {}
+    avg_times["local"] = sum(times["local"])/len(times["local"])
+    avg_times["pull"] = sum(times["pull"])/len(times["pull"])
+    avg_times["push"] = sum(times["push"])/len(times["push"])
+
+    plt.bar(['Local','Pull','Push'],[avg_times["local"],avg_times["pull"],avg_times["push"]])
+    plt.title("Latency by Mode")
+    plt.ylabel("Latency (s)")
+    plt.savefig("latency.png")
+    print(f"Local latency: {avg_times['local']}")
+    print(f"Pull latency: {avg_times['pull']}")
+    print(f"Push latency: {avg_times['push']}")
+
+
+
diff --git a/example/performance_report.md b/example/performance_report.md
@@ -0,0 +1,81 @@
+
+# Performance Report
+
+The performance test of our system focused on two main components: latency and throughput. For each component, one specific function was registered and tested.
+
+
+## Methodology
+
+**Latency**
+
+In the src/ folder, launch the redis server ('redis-server') and the FaaS service ('uvicorn main:app --reload').
+
+ In a separate terminal, run the latency test ('python3 latency.py'). The latency test will run 10 times and output the average time of each mode to return, as well as generate a bar plot of the latency in ('latency.png'). The latency test was performed by registering a no operation function (noOp) to the Redis server, which should terminate immediately after being sent to a worker. After registering this function, we sent 1 request to the server to execute this function with a specific worker mode (worker count 1) and measured the amount of time it took for the server to respond. This process was repeated 100 times and the average latency was calculated.
+
+**Throughput**
+
+We employed a weak-scaling test to assess the throughput of the service. For each mode and each worker count (1,2,4,8), the test will place a constant number of requests per worker to be executed; the total number of tests increased linearly with worker count. The function executed accepted an integer argument, doubled it, slept for 3 seconds, then returned the result. For each task, we gave a unique argument to ensure all unique tasks were being completed. We measured the total time it took for all tasks to be completed once they were sent by the client.
+
+To run, launch './run_performance.sh [mode] [num workers]'. Results will be saved in 'src/results/[mode]_[num_workers].csv'. This script will compute results for a linear scale rate of 1 task per worker and 3 tasks per worker. 5 trials are run for each. This script was run for all 3 modes and 1,3,5,7 workers. After each runtime, all launched terminal windows must be closed to close the service before running again.
+
+To generate graphs with compute results, run results/performance.ipynb
+
+## Results
+**Latency**
+
+![Image](./figures/final-latency.png)
+
+*Plot of latency vs. Mode*
+
+As demonstrated in the bar plot, the latecy of the local mode was the lowest of all the worker modes.  The latency of the pull and push mode were comparable, with the pull mode being slightly faster than the push mode. 
+
+**Throughput**
+
+Runtimes can be found in results folder under [mode]_[num workers].csv. The left column indicates runtimes for 1 task per worker, and the right column indicates runtimes for 3 tasks per worker.
+
+
+![Image](./figures/average_time_plot_1.png)
+
+*Plot of runtime vs. Worker Count for each Mode, 1 task per Worker*
+
+![Image](./figures/average_time_plot_3.png)
+
+*Plot of runtime vs. Worker Count for each Mode, 3 task per Worker*
+
+As demonstrated in the plot above, the throughput of the service differed significantly with both the worker count and the mode of worker.
+
+Both the push and pull modes were slower than the local mode, with push mode being the slowest by far of all modes. At all worker counts, this hierarchy of local as fastest and push as slowest was observed.
+
+The throughput decreased (completion time increased) as the number of workers increased for all modes. This decrease was strongest for both remote worker modes, and was weak for the local mode. 
+
+For the greater number of tasks per worker (3), we also note that the increase to runtime relative to increase worker count was greater (slope of line steeper). This held for all modes.
+
+
+## Discussion
+**Latency**
+
+We believe the pull mode had lower latency than the push mode due to the blocking nature of the ZMQ REQ/REP pattern. The dispatcher, after sending the task to the worker, would be blocked until the worker returned its result. Since the latency test was conducted by only one worker, there would be no other worker potentially blocking the dispatcher from receiving the result from the correct pull worker. If the latency test was conducted with additional workers, it is likely latency would increase as workers would poll the dispatcher for tasks and block the working worker from returning its result.
+
+The local mode had the lowest latency as no ZMQ connections were made. Instead, tasks were processed and returned to the Redis server locally, reducing overhead.
+
+The push mode's greater latency was likely a result of the manner in which the dispatcher handled tasks being sent. In the push mode, the dispatcher awaits a message until a specified timeout, then attempts to fetch a task from the queue.
+
+Several approaches could be taken in a future implementation to reduce latency of the pull and push worker modes. As mentioned in the technical report, the ZMQ connection and its limitations served as a limitation in our implementation of each modes. For the pull mode, we required workers to poll the dispatcher for tasks. As the ZMQ REQ/REP pattern is blocking, having multiple workers polling the dispatcher would prevent a worker with a finished result from delivering its result. This may account for latency in the system. 
+
+**Throughput**
+
+The local mode, as the baseline model, was once again the fastest of all modes. This was expected, as the lack of ZMQ connections greatly reduced overhead. In addition, there was no blocking mechanism in the local mode: tasks would be dispatched to the pool as soon as they were received, and there were no additional heartbeat or registration messages that would block the processing and returning of tasks.
+
+The throughput of the service decreased (runtime increased) with worker count for all modes. Since we tested throughput through weak-scaling, the number of total tasks increased linearly with the number of workers and it was expected that each worker would still complete the same number of tasks despite worker count. A greater worker count (and thus greater number of tasks) increased the overhead involved in placing tasks in queue and dispatching them to available workers.
+
+The throughput decreased at a quicker rate relative to worker count when the number of tasks per worker was greater. This is likely due to more total tasks being assigned for the workers to complete, thus increasing the total overhead since more tasks had to be fetched and dispatched.
+
+For both remote modes, a greater number of workers led to more heartbeats being sent. The dispatcher had to manage more of these messages, increasing overhead since receiving these messages would block others from being received.
+
+For the pull mode, a greater number of workers led to more pull workers polling the dispatcher for tasks.
+
+For the push mode, the process of finding an available worker would take longer as the number of workers increased since we cycled through the worker registration to find a free worker. This would partially contribute to the greater overhead in the push mode.
+
+
+
+