This section is very much work in progress...
Clone the repository and install the dev (and docs) requirements:
git clone https://github.com/ISG-Siegen/assembled.git
cd assembled
python3 -m venv venv_assembled
source venv_assembled/bin/activate
pip install .[dev,docs,openml]
We use py.test for unit testing.
While in the root directory (and in the current environment), call:
python -m pytest tests/
Install the pre-commit hook while being in the environment with:
pre-commit install
Afterwards, each time you commit, the pre-commit hook will be executed.
Use pre-commit run --all-files
to run it for all files.
Which hooks to use is still in debate. Not sure about mypy default etc.
- We do not support repetitions in a metatask itself. We argue that in such case the repetitions are "outside" of the
metatasks. In other words, to benchmark n-repeated k-fold we think it is more appropriate to create n metatasks
instead of 1 containing all repetition data. Not sure if this is the best way for the future but for now it is okay.
- Including n-repeated in a metatask could be achieved by adding appropriate prefixes to the base models and making the fold_indicator a 2D array.
- While in the project root call:
python3 -m build
- Afterwards, upload it to pypi via:
python3 -m twine upload dist/*
In the docs directory call while being in an environment with the doc requirements installed:
make clean && make html
Since many ensemble techniques would not need to touch the trainings features, we could avoid preprocessing the training features entirely for these cases. This, however, requires us to remove a lot of checks from our models and code (like no nans allowed, numeric only dtype,...).
Since we do not want to do this, it seems more logical to determine a default preprocessor. The default preprocessor transforms categories to integers and fills missing values.
- Add better developer documentation
- Refactor / Re-work unit test to exclude OpenML as much as possible and add more tests
- Add CI: Automatic Testing; Releases