For Developers

This section is very much work in progress...

Installation

Clone the repository and install the dev (and docs) requirements:

git clone https://github.com/ISG-Siegen/assembled.git
cd assembled
python3 -m venv venv_assembled
source venv_assembled/bin/activate
pip install .[dev,docs,openml]

Unit Tests

We use py.test for unit testing.

While in the root directory (and in the current environment), call:

python -m pytest tests/

Pre commits

Install the pre-commit hook while being in the environment with:

pre-commit install

Afterwards, each time you commit, the pre-commit hook will be executed.

Use pre-commit run --all-files to run it for all files.

Which hooks to use is still in debate. Not sure about mypy default etc.

Some Notes on Design Decision

We do not support repetitions in a metatask itself. We argue that in such case the repetitions are "outside" of the metatasks. In other words, to benchmark n-repeated k-fold we think it is more appropriate to create n metatasks instead of 1 containing all repetition data. Not sure if this is the best way for the future but for now it is okay.
- Including n-repeated in a metatask could be achieved by adding appropriate prefixes to the base models and making the fold_indicator a 2D array.

Build and Deployment

While in the project root call: python3 -m build
Afterwards, upload it to pypi via: python3 -m twine upload dist/*

Build Docs Locally

In the docs directory call while being in an environment with the doc requirements installed:

make clean && make html

Other

Dataset Checks for Fake Base Models

Since many ensemble techniques would not need to touch the trainings features, we could avoid preprocessing the training features entirely for these cases. This, however, requires us to remove a lot of checks from our models and code (like no nans allowed, numeric only dtype,...).

Since we do not want to do this, it seems more logical to determine a default preprocessor. The default preprocessor transforms categories to integers and fills missing values.

Documentation TODOs for the Future

Add better developer documentation
Refactor / Re-work unit test to exclude OpenML as much as possible and add more tests
Add CI: Automatic Testing; Releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEV_README.md

DEV_README.md

For Developers

Installation

Unit Tests

Pre commits

Some Notes on Design Decision

Build and Deployment

Build Docs Locally

Other

Dataset Checks for Fake Base Models

Documentation TODOs for the Future

Files

DEV_README.md

Latest commit

History

DEV_README.md

File metadata and controls

For Developers

Installation

Unit Tests

Pre commits

Some Notes on Design Decision

Build and Deployment

Build Docs Locally

Other

Dataset Checks for Fake Base Models

Documentation TODOs for the Future