-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
initial commit of the RabbitMQ GLS notebook
- Loading branch information
1 parent
6019209
commit c1bc02a
Showing
1 changed file
with
277 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,277 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In the first Jupyter notebook I was looking at the different ways to calulate pi, in this notebook I am going to look at calculating pi with a pair of RabbitMQ queues. The approach I'm going to take is to write a producer which will generate ranges of values for a node to calculate in a job queue. The worker nodes will consume the work node, make all of the calculations, and then add its results to another results queue. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Connecting to and Configuring the RabbitMQ queue**\n", | ||
"\n", | ||
"Straight from the RabbitMQ python hello world, my RabbitMQ server is running in a Docker container on my desktop. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"<METHOD(['channel_number=1', 'frame_type=1', \"method=<Queue.DeclareOk(['consumer_count=0', 'message_count=0', 'queue=GLS_results'])>\"])>" | ||
] | ||
}, | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"import pika\n", | ||
"connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))\n", | ||
"channel = connection.channel()\n", | ||
"\n", | ||
"channel.queue_declare(queue='GLS_ranges', durable=True)\n", | ||
"channel.queue_declare(queue='GLS_results', durable=True)\n", | ||
"channel.basic_qos(prefetch_count=1)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Creating a Producer Function**\n", | ||
"\n", | ||
"In this function we are going to produce a bunch of dictionaries (practically JSON documents) to our message queue using the `basic_publish` function in our `channel` object. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def work_producer(endNumber, blockStepping=1000000):\n", | ||
" for i in range(1, endNumber, blockStepping):\n", | ||
" d = {\"start\":i, \"end\":i+blockStepping}\n", | ||
" channel.basic_publish(exchange='',\n", | ||
" routing_key='GLS_ranges',\n", | ||
" body=f'{d}')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Creating a Worker Function**\n", | ||
"\n", | ||
"The worker function needs to more things than the producer, it needs to read the queue, figure out if the starting value should be added or subtracted, do the rest of the calculations, and then report back a change value to the `GLS_results` queue. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"*Reading from the queue* \n", | ||
"\n", | ||
"Going to lift again from RabbitMQ's Hello World for our consume because we have a basic message to get. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"ename": "NameError", | ||
"evalue": "name 'callback' is not defined", | ||
"output_type": "error", | ||
"traceback": [ | ||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", | ||
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", | ||
"\u001b[1;32m<ipython-input-4-38ccd7ad6e98>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m channel.basic_consume(queue='GLS_ranges',\n\u001b[0;32m 2\u001b[0m \u001b[0mauto_ack\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mTrue\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m on_message_callback=callback)\n\u001b[0m", | ||
"\u001b[1;31mNameError\u001b[0m: name 'callback' is not defined" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import json\n", | ||
"\n", | ||
"def callback(ch, method, properties, body):\n", | ||
" v = json.loads(body)\n", | ||
" calculations(v['start'], v['end'])\n", | ||
"channel.basic_consume(queue='GLS_ranges',\n", | ||
" auto_ack=True,\n", | ||
" on_message_callback=callback)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"*Doing the calculations* \n", | ||
"\n", | ||
"We need a function to do the calculations, but first that function needs to know whether to start off with addition or subtraction. To figure that out, we need to look at the nature of the Gregory-Leibniz series and how the denominators work, the values we work with are all odd numbers, but we are adding two for each step of the calculation. This means if we add or subtract 1 to our current number we will get an even number, and using the modulo operator `%`, we can figure out if a number has a remained if divided by 4. \n", | ||
"\n", | ||
"* If we add 1 to our number and the modulo operator returns `0` OR subtract 1 and the modulo returns `2`, we need to subtract our current calculation.\n", | ||
"* If we add 1 to our number and the modulo operator returns `2` OR subtract 1 and the modulo returns `0`, we need to add our current calculation. \n", | ||
"\n", | ||
"Since we are doing this in a queued way to do a sort of multiprocessing, we are going to also use the `decimal` library since a simple built in `float` will probably not keep up, so we will set our Decimal context to 2^16 (`65536`). On my desktop the decimal library let's me set the `prec` value to 2^59. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from decimal import getcontext, Decimal\n", | ||
"\n", | ||
"getcontext().prec = 2**16\n", | ||
"\n", | ||
"def calculations(startingN, endingN):\n", | ||
" if type(startingN) != int:\n", | ||
" startingN = int(startingN)\n", | ||
" if type(endingN) != int:\n", | ||
" endingN = int(endingN)\n", | ||
" range_value = Decimal(0.0)\n", | ||
" if (startingN + 1)%4 == 0:\n", | ||
" current_operator = \"-\"\n", | ||
" elif (startingN + 1)%4 == 2:\n", | ||
" current_operator = \"+\"\n", | ||
" for i in range(startingN, endingN, 2):\n", | ||
" current_calculation = Decimal(4)/Decimal(i)\n", | ||
" if current_operator == \"+\":\n", | ||
" range_value += current_calculation\n", | ||
" current_operator = \"-\"\n", | ||
" elif current_operator == \"-\":\n", | ||
" range_value -= current_calculation\n", | ||
" current_operator = \"+\"\n", | ||
" publish_range(range_value)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def publish_range(publish_value):\n", | ||
" channel.basic_publish(exchange='',\n", | ||
" routing_key='GLS_results',\n", | ||
" body=f'{str(publish_value)}')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Creating an Accumulator**\n", | ||
"\n", | ||
"This was a hard function to write, I went with writing a file juggling algorithm as I couldn't figure out how to get a running pi calculation variable to be callable through the callback. I also write a file of the decimal.Decimal value recieved from the RabbitMQ queue." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def callback_accumlation(ch, method, properties, body):\n", | ||
" if os.path.exists(\"calculation.txt\") == False:\n", | ||
" f = open(\"calculation.txt\", \"w\")\n", | ||
" f.write(f\"{str(Decimal(0.0))}\")\n", | ||
" f.close()\n", | ||
" with open(\"calculation.txt\", \"r\") as pi_file:\n", | ||
" pi_calc = Decimal(pi_file.read())\n", | ||
" pi_calc += Decimal(body.decode(\"utf-8\"))\n", | ||
" with open(\"calculation.txt\", \"w\") as pi_file:\n", | ||
" pi_file.write(str(pi_calc))\n", | ||
" with open(f\"returndata.{str(uuid.uuid4())}.txt\",\"w\") as f:\n", | ||
" f.write(body.decode(\"utf-8\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Adding Command Line Arguments**\n", | ||
"\n", | ||
"I want to produce a single python file and call it using the command line, so I'm going to use the `argparse` library to add the command line arguments. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def arguments():\n", | ||
" parser = argparse.ArgumentParser()\n", | ||
" parser.add_argument(\"-e\", \"--endinteger\", type=int,\n", | ||
" help=\"An ending integer you wish your program to stop at, pick a big number (like more than a billion)\")\n", | ||
" parser.add_argument(\"-s\", \"--stepsize\", default=1000000, type=int,\n", | ||
" help=\"Step size is the number of numbers to be calculated in a single go, computers are fast, pick a reasonably big number (like 1 million)\")\n", | ||
" parser.add_argument(\"-p\", \"--processnumber\", type=int,\n", | ||
" help=\"Which process this program will run on. 0 = producer, 1>= calculator\")\n", | ||
" args = parser.parse_args()\n", | ||
" return(args)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"**Bringing Everything together**\n", | ||
"\n", | ||
"I have included the full `rabbitmq-GLS.py` file in this GitHub repository. Using a second script you can easily spawn a bunch of workers, the accumulator, and the producer processes. This second script uses the multiprocessing library to get the number of cpu cores with `cpu_count()`, and uses the `subprocess.Popen()` function to spawn the threads. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import subprocess, time, multiprocessing\n", | ||
"\n", | ||
"cpu_count = multiprocessing.cpu_count()\n", | ||
"script_path = \"rabbitmq-GLS.py\"\n", | ||
"end_number = \"5000000000\"\n", | ||
"\n", | ||
"for i in range(cpu_count-1):\n", | ||
" subprocess.Popen([\"python\", script_path, \"-e\", end_number, \"-p\", \"2\"])\n", | ||
"subprocess.Popen([\"python\", script_path, \"-e\", end_number, \"-p\", \"1\"])\n", | ||
"time.sleep(2)\n", | ||
"subprocess.Popen([\"python\", script_path, \"-e\", end_number, \"-p\", \"0\"])" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |