The learning module can be used to automatically create a new test from an existing binary. It requires:
- A binary implementing the targeted function
- At least one call to this function with working arguments
- The targeted function prototype, including involved structures
With these information, the module will run an instrumented version of the function, and collect semantic side effects.
Then, these side effects are abstracted in an architecture agnostic form, close to the C language.
During the recognition phase, this form is derived according to the expected
memory layout: structure padding, int
size, etc.
If there are multiple calls to the function, Sibyl will apply a pruning policy to keep the only relevant ones, according to the associated configuration.
Let's use SoftFp, 2016/12/20 release, a software floating point library, as an example.
We will target a few arithmetic functions, and use the regression test
softfptest
to retrieve calls to these functions.
softfptest
is slightly modified before being used. Indeed, we only need a few
run of the main loop to obtain a fairly amount of calls to the targeted
functions. As a result, the learning process will be faster.
Function prototypes are also needed:
typedef unsigned long int uint64_t;
typedef unsigned int uint32_t;
typedef uint64_t sfloat64;
typedef enum {
RM_RNE,
RM_RTZ,
RM_RDN,
RM_RUP,
RM_RMM,
} RoundingModeEnum;
sfloat64 add_sf64(sfloat64 a, sfloat64 b, RoundingModeEnum rm, uint32_t *pfflags);
sfloat64 mul_sf64(sfloat64 a, sfloat64 b, RoundingModeEnum rm, uint32_t *pfflags);
sfloat64 div_sf64(sfloat64 a, sfloat64 b, RoundingModeEnum rm, uint32_t *pfflags);
sfloat64 sqrt_sf64(sfloat64 a, RoundingModeEnum rm, uint32_t *pfflags);
sfloat64 fma_sf64(sfloat64 a, sfloat64 b, sfloat64 c, RoundingModeEnum rm, uint32_t *pfflags);
sfloat64 min_sf64(sfloat64 a, sfloat64 b, uint32_t *pfflags);
sfloat64 max_sf64(sfloat64 a, sfloat64 b, uint32_t *pfflags);
Also, the PIN tracer has to be compiled:
$ cd ext/pin_tracer
$ PIN_ROOT=/opt/... make
...
And the configuration set accordingly (see the associated documentation for more detail).
The target action is learn
.
In this example, the tracer used is PIN, for performance reasons and because the target binary is available on a supported architecture. In other cases, the Miasm tracer is still available.
A lot of calls returns zero (due to the architecture of the regression test). To
ignore them (there are mostly irrelevant and pollute the resulting tests),
--avoid-null
(-z
) is used.
The result is dumped in a Python file: -o float_{NAME}.py
.
The complete command line is:
$ sibyl learn -v -z {FUNC_NAME} softfptest soft.h -o float_{NAME}.py
One may notice that a few warning are displayed:
WARNING: argument pfflags not used?!
Indeed, Sibyl has detected that the pfflags
argument seems to not be used in
any of the calls keep. This could indicate a lack of call example, a too
restrictive implementation, or a useless argument.
The resulting test looks like:
from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE
from sibyl.test.test import TestHeader, TestSetTest
class Testmax_sf64(TestHeader):
'''This is an auto-generated class, using the Sibyl learn module'''
func = "max_sf64"
header = '''
typedef unsigned long int uint64_t;
...
'''
def init1(self):
self._add_arg(0, 0) # arg0_a
self._add_arg(1, 9218868437227405312) # arg1_b
def check1(self):
return all((
# Check output value
self._get_result() == 0x7ff0000000000000,
))
...
tests = TestSetTest(init1, check1) & TestSetTest(init2, check2) & TestSetTest(init3, check3) & TestSetTest(init4, check4) & TestSetTest(init5, check5)
TESTS = [Testmax_sf64]
When type understanding is needed, the tests are a bit more complicated (from mul_sf64
):
def init2(self):
# arg3_pfflags
base0_ptr_size = self.field_addr("arg3_pfflags", "*(arg3_pfflags)") + self.sizeof("*(arg3_pfflags)")
base0_ptr = self._alloc_mem(base0_ptr_size, read=True, write=True)
self._add_arg(0, 0) # arg0_a
self._add_arg(1, 9218868437227405312) # arg1_b
self._add_arg(2, 0) # arg2_rm
self._add_arg(3, base0_ptr) # arg3_pfflags
# *(arg3_pfflags) = 0x0
self._write_mem(base0_ptr, self.pack(0x0, self.sizeof("*(arg3_pfflags)")))
self.base0_ptr = base0_ptr
def check2(self):
return all((
# Check output value
self._get_result() == 0x7ff8000000000000,
# *(arg3_pfflags) == 0x10
self._ensure_mem(self.base0_ptr, self.pack(0x10, self.sizeof("*(arg3_pfflags)"))),
))
Outputs are directly usable as Sibyl test. To regroup them in a common test set, one can create a Python script merging TESTS
list from the different scripts, as:
out = []
for f in ["add", "mul", "div", "sqrt", "fma", "min", "max"]:
execfile("float_%s_sf64.py" % f)
out += TESTS
TESTS = out
To inform Sibyl about this new test set, a line is added in the configuration (see the associated documentation for more detail):
[tests]
sfloat = /path/to/float.py
The tests are now detected by Sibyl, as stated by this command line:
$ sibyl config
...
sfloat (7)
add_sf64, mul_sf64, div_sf64, sqrt_sf64, fma_sf64, min_sf64, max_sf64
At this stage, they are replayable on new binaries, for instance on an obfuscated version of softfptest
:
$ sibyl func softfptest.obfu | sibyl find -v -t sfloat -b ABI_AMD64_SYSTEMV softfptest.obfu -
Guessed architecture: x86_64
Found 405 addresses
Found 7 test cases
0x004330d0 : max_sf64
0x0042a9a0 : mul_sf64
0x00431e30 : min_sf64
0x0042bc90 : fma_sf64
0x00430c70 : sqrt_sf64
0x00429270 : add_sf64
0x0042ee70 : div_sf64
The learning module has known limitations.
As mentioned in the previous section, it is necessary to have a working binary, which call the function with valid arguments.
In addition, this binary must be traceable, which could not be the case, depending on the architecture.
The limitation of Sibyl are also applied to this module; for instance and for now, there is no support of floating argument, or ABI specificity such as structure in-lining in arguments.
For now Sibyl does not track, and then does not support, functions using an
allocator for their semantic use (for example, a function allocating a new
structure through malloc
).