dataset
is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.
The supported test suites are the following:
- NIST's Juliet;
- NIST's C Test Suite;
- A toy dataset.
- ELF format
- x86 architecture
The module does the following steps for each test suite that needs to be built:
- Getting the available sources into the test suite's folder
- Preprocessing the sources for including all the required sources and header
- Writing the preprocessed sources into the
sources
folder from the root of the repository - Creating a new entry into the CSV files of the dataset, namely
vulnerables.csv
- Filtering the sources based on the wanted CWEs
- Compiling the preprocessed sources with the compile and link flags from multiple sources (module's ones and user-provided)
- Writing the executables into the
executables
folder from the root of the repository.
All gcc
operations are performed inside a 32-bit Ubuntu 18.04 container.
- Download the repository in
/opencrs/dataset
. If you want to use other path, modify the corresponding configururation parameter. - Ensure that the repository's submodules (which are the test suites) are downloaded too. If you want to clone the repository, use the flag
--recurse-submodules
to download them too. - Install the required Python 3 packages via
poetry install --no-dev
. - Build the Docker image:
docker build --tag ubuntu_32bit_compilator -f docker/Dockerfile.ubuntu_32bit_compilator .
. - Ensure the Docker API is accessible by:
- Running the module as
root
; or - Changing the Docker socket permissions (unsecure approach) via
chmod 777 /var/run/docker.sock
.
- Running the module as
β poetry run dataset build --testsuite TOY_TEST_SUITE
β
Successfully built 5 executables.
β poetry run dataset get
β
The available executables are:
ββββββββββββββββββββ³ββββββββββββββββββββββββββββββ³ββββββββββββββββββ³βββββββββββββββββββββββββββββββββββ
β ID β CWEs β Parent Database β Full Path β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β toy_test_suite_0 β Stack-based Buffer Overflow β toy_test_suite β executables/toy_test_suite_0.elf β
β toy_test_suite_1 β β toy_test_suite β executables/toy_test_suite_1.elf β
β toy_test_suite_2 β NULL Pointer Dereference β toy_test_suite β executables/toy_test_suite_2.elf β
β toy_test_suite_3 β NULL Pointer Dereference β toy_test_suite β executables/toy_test_suite_3.elf β
β toy_test_suite_4 β β toy_test_suite β executables/toy_test_suite_4.elf β
ββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ
β poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...
Builds and filters datasets of vulnerable programs
Options:
--help Show this message and exit.
Commands:
build Builds a test suite.
show Gets the executables in the whole dataset.
from dataset import Dataset
available_executables = Dataset().get_available_executables()