-
Notifications
You must be signed in to change notification settings - Fork 5
Running a Program
To get started with MAGE, I recommend first learning MAGE's workflow by running one of the example programs. This page walks you through doing just that.
For this tutorial, we'll use the merge_sorted
program (referred to simply as merge
in the OSDI 2021 paper). This program runs using garbled circuits. It accepts a set of records from each party, where each record is 128 bits (16 bytes) long, where the first 32 bits (4 bytes) of the record are the key and the remaining 96 bits (12 bytes) are the value. The garbler provides a list of records sorted by key in ascending order, and the evaluator provides a list of records sorted by key in descending order. The output is a merged list consisting of both parties' records, sorted by key in ascending order.
The user specifies the parameters for planning (e.g., the amount of available memory, number of workers per party, etc.) and for execution (e.g., number of concurrent oblivious transfers, host and port for each worker, etc.) in a single configuration file written in YAML.
Here's a simple configuration file to use for this tutorial. Save this content in a local file called config.yaml
in the repository's root directory:
page_shift: 12
num_pages: 14848
prefetch_buffer_size: 256
prefetch_lookahead: 10000
oblivious_transfer:
max_batch_size: 512
pipeline_depth: 1
num_daemons: 3
parties:
# Evaluator
- workers:
- internal_host: localhost
internal_port: 56000
external_host: localhost
external_port: 54323
storage_path: evaluator_swapfile_1
# Garbler
- workers:
- internal_host: localhost
internal_port: 50000
external_host: localhost
external_port: 54322
storage_path: garbler_swapfile_1
The following mandatory fields configure the planner:
-
The
page_shift
field sets the page size in the MAGE-virtual address space. In this case, the page size is1 << 12
or4096
. The physical size of this page depends on the semantics of the protocol we are targeting. In this case, we are targeting the garbled circuits (HalfGates) protocol For this protocol, each MAGE-virtual address corresponds to a bit of plaintext, and each bit of plaintext corresponds to 16 bytes of ciphertext. Thus, each page will consume(1 << 12) * 16
bytes, or 64 kilobytes, of space at runtime. -
The
num_pages
field defines the number of available pages at runtime. This configuration file targets 1 GiB of available space. This would normally mean16384
pages; however, we use14848
, which is slightly smaller, to account for (1) the prefetch buffer (see below), and (2) the memory used by MAGE's interpreter, which isn't taken into account when planning and therefore doesn't count against the page limit. -
The
prefetch_buffer_size
field is the number of additional pages used for prefetching. -
The
prefetch_lookahead
field describes how many instructions in advance to prefetch.
The following fields configure the HalfGates protocol:
-
The
oblivious_transfer/max_batch_size
field describes the batch size used for oblivious transfer extension (OTe). -
The
oblivious_transfer/num_daemons
field states how many connections (each backed by a daemon thread on each party) to use for oblivious transfer per party. -
The
oblivious_transfer/pipeline_depth
field states how many outstanding OTe batches can happen at a time over each connection.
The parties
field is mandatory. It describes the parties used for computation (two, in this case) and the number of workers (threads) used per party (one per party, in this case). The semantic meaning of each party is up to the protocol implementation. For example, the HalfGates protocol driver for MAGE expects two parties, where the first party (index 0) is the evaluator and the second party (index 1) is the garbler, whereas the CKKS protocol driver only uses the first party (index 0).
For each party, a list of workers is specified. For each worker, the configuration file specifies a file to which swapped out data can be written. This can also be a device (e.g., /dev/sdX
). The internal host/port is the host/port that other workers in the same party can use to contact it; the external host/port is the host/port that workers in other parties can use to contact it. A single party has multiple workers in the case of parallel/distributed execution; in the simple single-threaded case, each party has only one worker and only the external host/port is used. We use localhost
for this tutorial so that we can run the garbler and evaluator on the same machine.
MAGE looks up configuration information at each worker. Configuration information provided globally applies to all workers; configuration information provided under a specific worker applies only to that worker, overriding global fields with the same name (if present). This allows for some flexibility in the configuration file's layout.
Now that we have the configuration file in place, let's run MAGE's planning phase. This phase outputs a memory program that includes directives for moving data between memory and storage as needed.
First, change your working directory to bin/
. If the bin/
directory does not exist, make sure that you've built MAGE, following the instructions on the previous page (hint: run make
). Then, run the following command (sample output shown below):
$ ./planner merge_sorted halfgates ../config.yaml 0 0 256
Created program with 12548 instructions
Annotations Pass: [100%] [#####################################################]
Computed annotations
Replacement Pass: [100%] [#####################################################]
Finished replacement stage: 0 swapouts, 0 swapins
Scheduling Pass: [100%] [######################################################]
Finished scheduling swaps: 0 allocation failures, 0 synchronous swapins
Phase Times (ms): 5 8 3
Once you run this command, you'll see a file named merge_sorted_256_0.memprog
, which contains the output memory program. You'll also see other files ending in .prog
, .ann
, and .repprog
. These are intermediate files written out by the planner; they are no longer useful and can be deleted.
Note the output 0 swapouts, 0 swapins
. That's because a problem size of 256 fits comfortably within 1 GiB of memory, so there's no need to swap anything out to storage.
These are what the arguments to this command mean:
-
The first argument,
merge_sorted
, describes the problem to plan for. To see the list of built-in programs, run./planner
with no arguments. -
The second argument,
halfgates
, describes the protocol to plan for. This may be more general than the protocol itself (it may be shared by multiple protocols), so it's sometimes referred to as a "plugin" used in the planning phase. -
The third argument,
../config.yaml
, is the filepath of the configuration file. -
The fourth argument,
0
, is the party whose execution to plan. For thehalfgates
protocol, both the garbler and the evaluator use the same memory program; otherwise, we'd have to run the planner twice, once for the garbler and once for the evaluator. Recall that the HalfGates protocol implementation treats party0
as the evaluator and party1
as the garbler. -
The fifth argument,
0
is the worker whose execution to plan. In our case, we have only one worker per party (index0
). But in the case of parallel/distributed execution, the planner must be invoked separately for each worker. -
The sixth argument,
256
, is the problem size, which has a different meaning for each program. For themerge_sorted
program, it is the number of input records for each party. So an problem size of256
means that the garbler and evaluator each have 256 input records, and that the final output list has a size of 512 records.
For this problem size, planning happened almost instantaneously; the final line from the example output above shows each phase (placement, replacement, scheduling) took only single-digit milliseconds to run. For larger problem sizes (which we'll try out at the bottom of this document), planning takes longer. For such problems, the progress bars will show the progress of each pass over the memory program as it runs. If you redirect the planner's output to a file (e.g., via the shell's >
operator), then the progress bar is not included in the output. Note that the "Replacement Phase" described in the paper includes the "Annotations Pass" and the "Replacement Pass"; the "Placement Pass" does not have a progress bar.
Before we can execute the memory program that we generated by running the program above, we need to generate files containing each party's input to the computation.
Run ./example_input merge_sorted 256 1
.
The three arguments are the name of the program (merge_sorted
), the problem size (256
), and the number of workers (1
). You should see two output files, merge_sorted_256_0_evaluator.input
and merge_sorted_256_0_garbler.input
; if planning for parallel/distributed execution, you'll see separate files for each worker. The division of input data among each party's workers depends on the specifics of how the program (merge_sorted
) uses multiple workers to work together.
Recall that each party's input is a list of records, each 128 bits (16 bytes) long, with the first 32 bits (4 bytes) as the key. Each input file is a packed binary file containing these bits. You can use hexdump -C
to see the records. For example:
$ hexdump -C merge_sorted_256_0_garbler.input | head -n 10
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 0e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Each record is 16 bytes, so each line in the output corresponds to a single record. Each key is an integer encoded in two's complement little-endian format. The garbler's input consists of even-numbered keys sorted in increasing order. If you similarly inspect the evaluator's input, you'll see that it consists of odd-numbered keys sorted in decreasing order. The value associated with each record is 0.
Now that we've generated the memory program for the computation and files containing each party's input data, we are finally ready to execute the computation.
Open two separate terminals. In the first terminal, start the evaluator:
./mage halfgates ../config.yaml 0 0 merge_sorted_256
This process will wait for the second process to connect to it. Meanwhile, start the garbler in the second terminal:
./mage halfgates ../config.yaml 1 0 merge_sorted_256
The first argument is the protocol (halfgates
). The second argument is the filepath of the configuration file ../config.yaml
. The third argument is the party index (0
for the evaluator and 1
for the garbler). The fourth argument is the worker index (0
in our case, as we have a single worker per party). In the case of parallel/distributed execution, mage
must be invoked separately for each worker. The fifth argument describes the problem name and problem size, joined by an underscore (merge_sorted_256
).
When both commands are run, both parties will print out output. Here is sample output from the evaluator:
$ ./mage halfgates ../config.yaml 0 0 merge_sorted_256
Memory alloc time: 14 ms
Total init time: 15 ms
Execution: [ 0%] [............................................................]READ-INSTR (ns): ( min = 386080, avg = 386080, max = 386080, count = 1, sum = 386080 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 75509, avg = 75509, max = 75509, count = 1, sum = 75509 )
Timer: 69023347 ns
READ-INSTR (ns): ( min = 386080, avg = 386080, max = 386080, count = 1, sum = 386080 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 4549, avg = 796217, max = 59150918, count = 78, sum = 62104999 )
Execution: [100%] [############################################################]
READ-INSTR (ns): ( min = 1114, avg = 138045, max = 386080, count = 3, sum = 414136 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 4549, avg = 796217, max = 59150918, count = 78, sum = 62104999 )
184 ms
There is a progress bar (nice to have for longer computations), interleaved with timing/performance statistics printed out by the program. the final line contains the total time elapsed for the computation (excluding setup time, such as setting up the swapfile, but including oblivious transfers).
At the program's end, MAGE prints out statistics (see the five lines right before 184 ms
). These five lines show how many times MAGE made system calls for each of the pictured operations, and statistics for how long each such system call took. These statistics are chosen as they can be collected efficiently during execution and can help determine if a particular resource is the bottleneck.
Similar lines printed earlier in the program are triggered not by MAGE's engine, but by the program itself; the program is written to print out the cumulative statistics at key points during execution (e.g., right after oblivious transfers are complete, and right before writing the program's output to a file).
The line starting with Timer
in the middle of the computation shows the time spent between completion of oblivious transfers and writing the program's output to a file. The timer is started and stopped by the running program; not all programs have such well-defined phases (oblivious transfer, compute, and output), and MAGE's engine does not try to determine when phase boundaries occur.
We've looked at the command-line output printed by MAGE, but what about the output of the merge_sorted
program? This is written to the file merge_sorted_256_0.output
. We can use hexdump -C
, once again, to inspect the contents of this file:
$ hexdump -C merge_sorted_256_0.output | head -n 10
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000070 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000090 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
It contains, as expected, records from both parties' inputs (both even keys and odd keys), merged into a single list, sorted by key in increasing order.
In the current HalfGates implementation, all output is written by the garbler. As it is a semi-honest SMPC protocol, the garbler can just give it to the evaluator. In the future, we may add support for the evaluator receiving the output directly.
In the case of parallel/distributed computation, each of the garbler's workers will print out a separate output file, containing the data output by each worker.
While the above exercise got you acquainted to MAGE's workflow, the problem (merge_sorted
with a size of 256
) fits comfortably in the 1 GiB of memory allocated for this task. If you'd like, you can run the same program with a much larger problem size of 1048576
. Note: If you try this, make sure you have at least 10 GiB of free space for the planner to write out intermediate files.