`dataset` 🗂️

Description
- Limitations
How It Works
Setup
Usage
- As a CLI Tool
- As a Python Module

Description

dataset is the CRS module that compiles and manages the vulnerable programs which will be analyzed by the CRS.

The supported test suites are the following:

NIST's Juliet;
NIST's C Test Suite;
A toy dataset.

Limitations

ELF format
x86 architecture

How It Works

The module does the following steps for each test suite that needs to be built:

Getting the available sources into the test suite's folder
Preprocessing the sources for including all the required sources and header
Writing the preprocessed sources into the sources folder from the root of the repository
Creating a new entry into the CSV files of the dataset, namely vulnerables.csv
Filtering the sources based on the wanted CWEs
Compiling the preprocessed sources with the compile and link flags from multiple sources (module's ones and user-provided)
Writing the executables into the executables folder from the root of the repository.

All gcc operations are performed inside a 32-bit Ubuntu 18.04 container.

Setup

Download the repository in /opencrs/dataset. If you want to use other path, modify the corresponding configururation parameter.
Ensure that the repository's submodules (which are the test suites) are downloaded too. If you want to clone the repository, use the flag --recurse-submodules to download them too.
Install the required Python 3 packages via poetry install --no-dev.
Build the Docker image: docker build --tag ubuntu_32bit_compilator -f docker/Dockerfile.ubuntu_32bit_compilator ..
Ensure the Docker API is accessible by:
- Running the module as root; or
- Changing the Docker socket permissions (unsecure approach) via chmod 777 /var/run/docker.sock.

Usage

As a CLI Tool

Test Suite Build

➜ poetry run dataset build --testsuite TOY_TEST_SUITE
✅ Successfully built 5 executables.

Executables Listing

➜ poetry run dataset get
✅ The available executables are:

┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ID               ┃ CWEs                        ┃ Parent Database ┃ Full Path                        ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ toy_test_suite_0 │ Stack-based Buffer Overflow │ toy_test_suite  │ executables/toy_test_suite_0.elf │
│ toy_test_suite_1 │                             │ toy_test_suite  │ executables/toy_test_suite_1.elf │
│ toy_test_suite_2 │ NULL Pointer Dereference    │ toy_test_suite  │ executables/toy_test_suite_2.elf │
│ toy_test_suite_3 │ NULL Pointer Dereference    │ toy_test_suite  │ executables/toy_test_suite_3.elf │
│ toy_test_suite_4 │                             │ toy_test_suite  │ executables/toy_test_suite_4.elf │
└──────────────────┴─────────────────────────────┴─────────────────┴──────────────────────────────────┘

Help

➜ poetry run dataset
Usage: dataset [OPTIONS] COMMAND [ARGS]...

  Builds and filters datasets of vulnerable programs

Options:
  --help  Show this message and exit.

Commands:
  build  Builds a test suite.
  show   Gets the executables in the whole dataset.

As a Python Module

from dataset import Dataset

available_executables = Dataset().get_available_executables()

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
dataset		dataset
docker		docker
executables		executables
raw_testsuites		raw_testsuites
sources		sources
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
pyproject.toml		pyproject.toml
vulnerables.csv		vulnerables.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`dataset` 🗂️

Description

Limitations

How It Works

Setup

Usage

As a CLI Tool

Test Suite Build

Executables Listing

Help

As a Python Module

About

Contributors 3

Languages

open-crs/dataset

Folders and files

Latest commit

History

Repository files navigation

dataset 🗂️

Description

Limitations

How It Works

Setup

Usage

As a CLI Tool

Test Suite Build

Executables Listing

Help

As a Python Module

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages

`dataset` 🗂️