Skip to content

Running a Program

Sam Kumar edited this page Mar 22, 2021 · 1 revision

To get started with MAGE, I recommend first learning MAGE's workflow by running one of the example programs. This page walks you through doing just that.

For this tutorial, we'll use the merge_sorted program (referred to simply as merge in the OSDI 2021 paper). This program runs using garbled circuits. It accepts a set of records from each party, where each record is 128 bits (16 bytes) long, where the first 32 bits (4 bytes) of the record are the key and the remaining 96 bits (12 bytes) are the value. The garbler provides a list of records sorted by key in ascending order, and the evaluator provides a list of records sorted by key in descending order. The output is a merged list consisting of both parties' records, sorted by key in ascending order.

Configuration File

The user specifies the parameters for planning (e.g., the amount of available memory, number of workers per party, etc.) and for execution (e.g., number of concurrent oblivious transfers, host and port for each worker, etc.) in a single configuration file written in YAML.

Sample Configuration File

Here's a simple configuration file to use for this tutorial. Save this content in a local file called config.yaml in the repository's root directory:

page_shift: 12
num_pages: 14848
prefetch_buffer_size: 256
prefetch_lookahead: 10000

oblivious_transfer:
    max_batch_size: 512
    pipeline_depth: 1
    num_daemons: 3

parties:
    # Evaluator
    - workers:
        - internal_host: localhost
          internal_port: 56000
          external_host: localhost
          external_port: 54323
          storage_path: evaluator_swapfile_1

    # Garbler
    - workers:
        - internal_host: localhost
          internal_port: 50000
          external_host: localhost
          external_port: 54322
          storage_path: garbler_swapfile_1

Explanation of the Sample Configuration File

The following mandatory fields configure the planner:

  1. The page_shift field sets the page size in the MAGE-virtual address space. In this case, the page size is 1 << 12 or 4096. The physical size of this page depends on the semantics of the protocol we are targeting. In this case, we are targeting the garbled circuits (HalfGates) protocol For this protocol, each MAGE-virtual address corresponds to a bit of plaintext, and each bit of plaintext corresponds to 16 bytes of ciphertext. Thus, each page will consume (1 << 12) * 16 bytes, or 64 kilobytes, of space at runtime.

  2. The num_pages field defines the number of available pages at runtime. This configuration file targets 1 GiB of available space. This would normally mean 16384 pages; however, we use 14848, which is slightly smaller, to account for (1) the prefetch buffer (see below), and (2) the memory used by MAGE's interpreter, which isn't taken into account when planning and therefore doesn't count against the page limit.

  3. The prefetch_buffer_size field is the number of additional pages used for prefetching.

  4. The prefetch_lookahead field describes how many instructions in advance to prefetch.

The following fields configure the HalfGates protocol:

  1. The oblivious_transfer/max_batch_size field describes the batch size used for oblivious transfer extension (OTe).

  2. The oblivious_transfer/num_daemons field states how many connections (each backed by a daemon thread on each party) to use for oblivious transfer per party.

  3. The oblivious_transfer/pipeline_depth field states how many outstanding OTe batches can happen at a time over each connection.

The parties field is mandatory. It describes the parties used for computation (two, in this case) and the number of workers (threads) used per party (one per party, in this case). The semantic meaning of each party is up to the protocol implementation. For example, the HalfGates protocol driver for MAGE expects two parties, where the first party (index 0) is the evaluator and the second party (index 1) is the garbler, whereas the CKKS protocol driver only uses the first party (index 0).

For each party, a list of workers is specified. For each worker, the configuration file specifies a file to which swapped out data can be written. This can also be a device (e.g., /dev/sdX). The internal host/port is the host/port that other workers in the same party can use to contact it; the external host/port is the host/port that workers in other parties can use to contact it. A single party has multiple workers in the case of parallel/distributed execution; in the simple single-threaded case, each party has only one worker and only the external host/port is used. We use localhost for this tutorial so that we can run the garbler and evaluator on the same machine.

MAGE looks up configuration information at each worker. Configuration information provided globally applies to all workers; configuration information provided under a specific worker applies only to that worker, overriding global fields with the same name (if present). This allows for some flexibility in the configuration file's layout.

Planning Phase

Now that we have the configuration file in place, let's run MAGE's planning phase. This phase outputs a memory program that includes directives for moving data between memory and storage as needed.

Running the Planning Phase

First, change your working directory to bin/. If the bin/ directory does not exist, make sure that you've built MAGE, following the instructions on the previous page (hint: run make). Then, run the following command (sample output shown below):

$ ./planner merge_sorted halfgates ../config.yaml 0 0 256
Created program with 12548 instructions
Annotations Pass: [100%] [#####################################################]
Computed annotations
Replacement Pass: [100%] [#####################################################]
Finished replacement stage: 0 swapouts, 0 swapins
Scheduling Pass: [100%] [######################################################]
Finished scheduling swaps: 0 allocation failures, 0 synchronous swapins

Phase Times (ms): 5 8 3

Once you run this command, you'll see a file named merge_sorted_256_0.memprog, which contains the output memory program. You'll also see other files ending in .prog, .ann, and .repprog. These are intermediate files written out by the planner; they are no longer useful and can be deleted.

Note the output 0 swapouts, 0 swapins. That's because a problem size of 256 fits comfortably within 1 GiB of memory, so there's no need to swap anything out to storage.

Explanation of the ./planner Command

These are what the arguments to this command mean:

  1. The first argument, merge_sorted, describes the problem to plan for. To see the list of built-in programs, run ./planner with no arguments.

  2. The second argument, halfgates, describes the protocol to plan for. This may be more general than the protocol itself (it may be shared by multiple protocols), so it's sometimes referred to as a "plugin" used in the planning phase.

  3. The third argument, ../config.yaml, is the filepath of the configuration file.

  4. The fourth argument, 0, is the party whose execution to plan. For the halfgates protocol, both the garbler and the evaluator use the same memory program; otherwise, we'd have to run the planner twice, once for the garbler and once for the evaluator. Recall that the HalfGates protocol implementation treats party 0 as the evaluator and party 1 as the garbler.

  5. The fifth argument, 0 is the worker whose execution to plan. In our case, we have only one worker per party (index 0). But in the case of parallel/distributed execution, the planner must be invoked separately for each worker.

  6. The sixth argument, 256, is the problem size, which has a different meaning for each program. For the merge_sorted program, it is the number of input records for each party. So an problem size of 256 means that the garbler and evaluator each have 256 input records, and that the final output list has a size of 512 records.

For this problem size, planning happened almost instantaneously; the final line from the example output above shows each phase (placement, replacement, scheduling) took only single-digit milliseconds to run. For larger problem sizes (which we'll try out at the bottom of this document), planning takes longer. For such problems, the progress bars will show the progress of each pass over the memory program as it runs. If you redirect the planner's output to a file (e.g., via the shell's > operator), then the progress bar is not included in the output. Note that the "Replacement Phase" described in the paper includes the "Annotations Pass" and the "Replacement Pass"; the "Placement Pass" does not have a progress bar.

Input Generation

Before we can execute the memory program that we generated by running the program above, we need to generate files containing each party's input to the computation.

Generating the Input

Run ./example_input merge_sorted 256 1.

The three arguments are the name of the program (merge_sorted), the problem size (256), and the number of workers (1). You should see two output files, merge_sorted_256_0_evaluator.input and merge_sorted_256_0_garbler.input; if planning for parallel/distributed execution, you'll see separate files for each worker. The division of input data among each party's workers depends on the specifics of how the program (merge_sorted) uses multiple workers to work together.

Understanding the Input

Recall that each party's input is a list of records, each 128 bits (16 bytes) long, with the first 32 bits (4 bytes) as the key. Each input file is a packed binary file containing these bits. You can use hexdump -C to see the records. For example:

$ hexdump -C merge_sorted_256_0_garbler.input | head -n 10
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  04 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  06 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  08 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  0a 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  0c 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  0e 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  10 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  12 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Each record is 16 bytes, so each line in the output corresponds to a single record. Each key is an integer encoded in two's complement little-endian format. The garbler's input consists of even-numbered keys sorted in increasing order. If you similarly inspect the evaluator's input, you'll see that it consists of odd-numbered keys sorted in decreasing order. The value associated with each record is 0.

Execution Phase

Now that we've generated the memory program for the computation and files containing each party's input data, we are finally ready to execute the computation.

Executing the Program

Open two separate terminals. In the first terminal, start the evaluator:

./mage halfgates ../config.yaml 0 0 merge_sorted_256

This process will wait for the second process to connect to it. Meanwhile, start the garbler in the second terminal:

./mage halfgates ../config.yaml 1 0 merge_sorted_256

The first argument is the protocol (halfgates). The second argument is the filepath of the configuration file ../config.yaml. The third argument is the party index (0 for the evaluator and 1 for the garbler). The fourth argument is the worker index (0 in our case, as we have a single worker per party). In the case of parallel/distributed execution, mage must be invoked separately for each worker. The fifth argument describes the problem name and problem size, joined by an underscore (merge_sorted_256).

Inspecting the Command-Line Output

When both commands are run, both parties will print out output. Here is sample output from the evaluator:

$ ./mage halfgates ../config.yaml 0 0 merge_sorted_256
Memory alloc time: 14 ms
Total init time: 15 ms
Execution: [  0%] [............................................................]READ-INSTR (ns): ( min = 386080, avg = 386080, max = 386080, count = 1, sum = 386080 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 75509, avg = 75509, max = 75509, count = 1, sum = 75509 )
Timer: 69023347 ns
READ-INSTR (ns): ( min = 386080, avg = 386080, max = 386080, count = 1, sum = 386080 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 4549, avg = 796217, max = 59150918, count = 78, sum = 62104999 )
Execution: [100%] [############################################################]
READ-INSTR (ns): ( min = 1114, avg = 138045, max = 386080, count = 3, sum = 414136 )
SWAP-BLOCKED (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-OUT (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
SWAP-IN (ns): ( min = 0, avg = 0, max = 0, count = 0, sum = 0 )
GATE-RECV (ns): ( min = 4549, avg = 796217, max = 59150918, count = 78, sum = 62104999 )
184 ms

There is a progress bar (nice to have for longer computations), interleaved with timing/performance statistics printed out by the program. the final line contains the total time elapsed for the computation (excluding setup time, such as setting up the swapfile, but including oblivious transfers).

At the program's end, MAGE prints out statistics (see the five lines right before 184 ms). These five lines show how many times MAGE made system calls for each of the pictured operations, and statistics for how long each such system call took. These statistics are chosen as they can be collected efficiently during execution and can help determine if a particular resource is the bottleneck.

Similar lines printed earlier in the program are triggered not by MAGE's engine, but by the program itself; the program is written to print out the cumulative statistics at key points during execution (e.g., right after oblivious transfers are complete, and right before writing the program's output to a file).

The line starting with Timer in the middle of the computation shows the time spent between completion of oblivious transfers and writing the program's output to a file. The timer is started and stopped by the running program; not all programs have such well-defined phases (oblivious transfer, compute, and output), and MAGE's engine does not try to determine when phase boundaries occur.

Inspecting the Program Output

We've looked at the command-line output printed by MAGE, but what about the output of the merge_sorted program? This is written to the file merge_sorted_256_0.output. We can use hexdump -C, once again, to inspect the contents of this file:

$ hexdump -C merge_sorted_256_0.output | head -n 10
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000010  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  04 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  05 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  06 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  07 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  08 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  09 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

It contains, as expected, records from both parties' inputs (both even keys and odd keys), merged into a single list, sorted by key in increasing order.

In the current HalfGates implementation, all output is written by the garbler. As it is a semi-honest SMPC protocol, the garbler can just give it to the evaluator. In the future, we may add support for the evaluator receiving the output directly.

In the case of parallel/distributed computation, each of the garbler's workers will print out a separate output file, containing the data output by each worker.

Larger Problem Sizes

While the above exercise got you acquainted to MAGE's workflow, the problem (merge_sorted with a size of 256) fits comfortably in the 1 GiB of memory allocated for this task. If you'd like, you can run the same program with a much larger problem size of 1048576. Note: If you try this, make sure you have at least 10 GiB of free space for the planner to write out intermediate files.