Add ensemble model

## Problem
Currently, we have a range of different approaches for classifying molecules in ChEBI (ELECTRA-based, GNN-based (https://github.com/ChEB-AI/python-chebai-graph) and algorithmic / logic-based (https://github.com/sfluegel05/chemlog2). 

All approaches have specific strengths and weaknesses. The goal of an `ensemble` is to take different methods and aggregate their predictions so that the final result is better than the individual results.

## Task
The architecture of the ensemble methods should take the following input:
- For each model and ChEBI class: A prediction (possible values: true, false, error / don't know, out of scope), possibly also a confidence value (real number that indicates how sure the model is about a prediction, e.g. from _0 - no confidence_ to _1 - very confident_)

It should aggregate these values into a single prediction (for each class), taking into account the predictions of each model and the "trustworthiness" of the model (this score is specific to each class, and possibly different for positive and negative predictions).

## Example:
Given a ChEBI class, we have received the following predictions:

| model A | model B | model C | model D |
| --- | --- | --- | --- |
| true | false | true | out of scope |

The simplest approach would be to weight all models equally and return `true` for this class (with a 2-1 vote). However, we should also take the trustworthiness into account.  These values might come from the precision / true predictive value (TPV; TP / (TP + FP)) and negative predictive value (NPV; TN / (TN + FN)) of a model on a test set.

| metric | model A | model B | model C | model D |
| --- | --- | --- | --- | --- |
| TPV | 0.7 | 0.8 | 0.6 | 0.7 |
| NPV | 0.9 | 0.99 | 0.8 | 1 |

In other words: If model A and model C predict "true" for this class, they are correct in 70% and 60% of cases (according to their TPV). If model model B predicts "false" for this class, it is correct in 99% of cases (according to the NPV).

An aggregation method would then weight two predictions with "trustworthiness" of 0.7 and 0.6 against one with 0.99. Depending on the aggregation method used, it might decide to trust model B. 

## Future work
- Extend this method towards a hierarchical ontology-based approach
- Use bagging and boosting to improve perfomance further

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ensemble model #72

Problem

Task

Example:

Future work

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add ensemble model #72

Description

Problem

Task

Example:

Future work

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions