-
Notifications
You must be signed in to change notification settings - Fork 1
Inferring the Asymptotic Complexity of Neural Networks with Merlin
Inferring the asymptotic complexity of neural networks is not that easy: neural networks are implemented as the combination of multiple "kernels". Each kernel is like a nest of loops that traverses multidimensional arrays (matrices, mostly). Finding the asymptotic complexity of all these kernels, and the algorithm that emerges out of their combination can be a rather tricky problem.
But we can use Merlin to generate very precise cost models for neural networks. In this context, a "cost model" is a function that relates program inputs with, for instance, the number of operations that an algorithm runs for these inputs. Merlin is a tool that infers cost models for programs based on polynomial interpolation. There is a written tutorial about how Merlin works here. Notice that Merlin currently instruments individual functions. Applying it onto a large program made of many functions still requires some manual interventions, which we explain at the end of this article. Oh, and if you want to know more about Merlin, read its paper.
In this article, we show how to use Merlin to build polynomials that relate inputs and number of operations for a neural network. We shall use the genann implementation of a neural network. This software is implemented in ANSI C with no additional dependencies. It allows the user to define the number of layers and neurons per layer in the network. This flexibility is very useful for our experiment, but we’ll get to that later.
Since Merlin can only analyze one function at a time, we should start by defining
which functions from the neural network are important for our analysis. The
genann
library has many functions, but there are only 2 that are actually
important for complexity analysis. First, there is
genann_run
,
which basically applies the forward step
of the network with its current weights. Second, there is
genann_train
,
which applies backpropagation
and updates the network’s weights.
Although we now know which functions will be analyzed, beware that Merlin works per counter. A counter is just an unsigned integer value that records the number of iterations of a single loop in a program. So, if we have 3 loops in a program, we’ll have 3 counters. Each counter has its own cost model. This is useful because a program might have multiple paths that influence its complexity. These paths might even be exclusive: treading one of them, you forgo the other. Thus, it's better to know how each input influences each program point. That's more precise than estimating how each input influences the whole program.
To insert counters in a program, we can use Merlin’s instrumentation module. In order to do that, we can use the following command:
./setup.sh && ./run.sh genann.c genann_instrumented.c genann_run false false
The setup.sh
script is needed to build Merlin’s instrumentation module,
whereas the run.sh
script will apply instrumentation to a given target function
in a given C/C++ file. Mind it that this is a simplified version of the actual
sequence of commands, as we will explain at the end of this article.
After that, we’ll get a new version of the program (genann_instrumented.c
) that
has counters for each loop in the analyzed function (genann_run
). Merlin will
also indicate which of the function’s inputs influence each counter, which will
be needed later for applying polynomial interpolation.
Once we have properly inserted counters in this source file using Merlin, we have to run the instrumented program with a few different inputs. The instrumented program will output pairs formed by the value of inputs and the value of counters which will let us do polynomial interpolation to derive a cost model for the counter.
However, before we move to polynomial interpolation, there’s something important
to address. The computational complexity of a neural network is actually
controlled by many variables, such as the number of layers, the size of individual
layers, the number of inputs, etc. Fortunately, Merlin can identify which parameters
control the complexity of a given function. For instance, by inspecting the instrumented
version of genann_run
, we see that four variables might have an effect on the
running time of that program:
double const *genann_run(genann const *ann, int hidden_layers, int outputs,
int hidden, int nn_inputs, double const *inputs) {
// …
int temphidden = hidden;
int temphidden_layers = hidden_layers;
int tempoutputs = outputs;
int tempnn_inputs = nn_inputs;
// …
}
We can choose any of them, fix the others, and then see how it influences the
running time of that implementation. This is where genann
’s flexibility will
come in handy! We are free to define the values for each of these parameters and
therefore run a set of different experiments with each of them.
For instance, let's choose hidden_layers
, which defines the number of hidden
layers in the network. In this case, we can fix all other parameters and run
the neural network with hidden_layers
= 1, 2, 3, 4, 5 and 6. If we do it, then
we get pairs of inputs and counters. Below we show some examples for one the
counters in genann_run
:
1 0
2 1
3 2
4 3
5 4
6 5
These pairs of input values and counters let us do interpolation using Newton's method. Interpolation will give us a polynomial that relates inputs with the expected number of iterations for a given loop.
Using Merlin’s interpolation module requires a few steps, but it isn’t too
complicated. First of all, you must concatenate the instrumentation output
for each desired input in a single file. After you have this file,
say instrumentation_output.txt
, you can use Merlin’s produceInput.py
script that
will transform these outputs to the appropriate input format of Merlin’s
interpolator. The following command is a simplified version of what
you must actually do:
python3 produceInput.py instrumentation_output.txt interpolation_input.txt
With the interpolation input in hand, you can build and run Merlin’s interpolator, like this:
make && ./bin/interpolator < interpolation_input.txt
The interpolator will then print a cost model for each of the counters that were reported in the instrumentation based on the pairs of inputs and counter values. For instance, the set of pairs that were previously shown generate the following output:
At line 273 : hidden_layers
x: hidden_layers
Expected Nesting Depth: 1
F(x) = x - 1
This polynomial indicates a linear relation between the variable hidden_layers
and the execution of the loop at line 273. More yet, it shows that if hidden_layers
has value N, then the loop at line 273 will run N-1 times.
The polynomial cost model for the previous counter was controlled by a single variable. However, there are also counters that are influenced by more than one input. For instance, if the counter is within a nested loop, its complexity will be controlled both by the inner loop but also by the outer loop.
Thankfully, Merlin can also identify polynomials with up to three variables. However, when dealing with multivariate polynomials it’s not possible to use Netwon’s Divided Differences method. In order to handle this, Merlin uses the Least Squares method, which is less efficient than Newton’s Method but is good enough for this task.
Multivariable interpolation with Merlin is exactly the same as single-variable
interpolation; we relate the counter’s values with a set of input values.
For instance, if there is a counter in genann_run
that is influenced by the
number of hidden layers (hidden_layers
) and by the number of neurons per layer
(hidden
), we could run an experiment where we vary both these parameters and
fix all others.
As an example, we can choose the values 1, 2, 3, 4, 5 and 6 for hidden_layers
and 1, 2, 3, 5, 8 and 13 for hidden
. Below, we show an example of the results
of this experiment for one of the counters in genann_run
.
2 1 0
1 2 1
2 2 4
2 3 8
3 2 9
2 4 12
5 2 25
2 5 16
8 2 64
2 6 20
13 2 169
If we use the same process as before to run Merlin’s interpolation, we would get the following cost model:
At line 276 : hidden hidden_layers
x: hidden
y: hidden_layers
Expected Nesting Depth: 3
F(x, y) = -1*xx + 1*xxy
This cost model correctly reflects the behavior of the innermost counter in the gennan_run
function. This behavior can be summarized by the figure below:
Merlin has been conceived to analyze individual functions. Applying it to build a cost model for a complex program such as neural network requires some interventions. Below we describe issues that we have faced, and how we have overcome them to write this article.
- There are two functions that we analyzed in the neural network, the training
function (
genann_train
) and the run function (genann_run
). Because Merlin’s instrumentation adds prints to the instrumented function, we couldn’t analyze both of them at the same time. Otherwise, their instrumentation data would collide. Therefore, we had to use two different instrumented files for our analyses, which made the experiments a bit more involved: we first instrumentedgenann_train
, collected results for it, and then instrumentedgenann_run
, and collected new results. - Originally, the neural network’s functions use a struct called
genann
to carry information about the network, such as the number of hidden layers, the number of inputs, the number of neurons per layer etc. The fields from this struct control the complexity of each loop in the analyzed functions. However, because of how Merlin’s static analyses were implemented, they can’t identify individual members from structs that influence the complexity of loops. Therefore, we had to manually change both functions to receive these members as individual parameters instead of receiving only a pointer to the structure. - Both
genann_train
andgenann_run
are called many times during the network’s training cycle. Therefore the same instrumentation output was repeatedly reported many times and, while running the experiments, the total output was gigantic. To avoid that and make the experiments feasible, we had to manually change both functions to only print the instrumentation data when the functions were called for the first time. - The
genann
library is quite simple, but it depends on its own header file to compile. Merlin’s run script wasn’t originally implemented to handle external dependencies. Thus, for it to properly work we had to manually change the script and add this header file to the C compiler’s include path. - Because of the previous problem, it was very hard to run the analysis of the
neural network within Merlin’s directory, which is how it was originally conceived.
We were only able to make it work by placing Merlin’s run script and shared library
file at
genann
’s directory. - As a consequence of the last shortcoming, we couldn’t directly use any of the scripts that automate the process of integrating instrumentation and interpolation. Therefore, most part of this integration was done manually.