Skip to content

Commit

Permalink
More benchmarks in docs (#1336)
Browse files Browse the repository at this point in the history
* Updates in documentation
  • Loading branch information
nicl-nno authored Sep 20, 2024
1 parent 6cfefca commit 5be9119
Show file tree
Hide file tree
Showing 11 changed files with 68 additions and 51 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file removed docs/source/benchmarks/img_benchmarks/fedot_meta.png
Binary file not shown.
Binary file not shown.
Binary file removed docs/source/benchmarks/img_benchmarks/stats.png
Binary file not shown.
Binary file not shown.
111 changes: 64 additions & 47 deletions docs/source/benchmarks/tabular.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,70 @@ Tabular data
------------

Here are overall classification problem results across state-of-the-art AutoML frameworks
using `AMLB <https://github.com/openml/automlbenchmark>`__ test suite:
using self-runned tasks form OpenML test suite (10 folds run) using F1:


.. csv-table::
:header: Dataset, Metric, AutoGluon, FEDOT, H2O, TPOT

adult, auc, 0.91001, 0.91529, **0.93077**, 0.92729
airlines, auc, 0.72491, 0.65378, **0.73039**, 0.69368
albert, auc, **0.73903**, 0.72765, nan, nan
amazon_employee_access, auc, 0.85715, 0.85911, **0.87281**, 0.86625
apsfailure, auc, 0.99062, 0.98999, **0.99252**, 0.99044
australian, auc, **0.93953**, 0.93785, 0.93857, 0.93604
bank-marketing, auc, 0.93126, 0.93245, **0.93860**, 0.93461
blood-transfusion, auc, 0.68959, 0.72444, **0.75949**, 0.74019
christine, auc, 0.80429, 0.80446, **0.81936**, 0.80669
credit-g, auc, **0.79529**, 0.78458, 0.79357, 0.79381
guillermo, auc, **0.89967**, 0.89125, nan, 0.78331
jasmine, auc, 0.88312, 0.88548, 0.88734, **0.89038**
kc1, auc, 0.82226, 0.83857, nan, **0.84481**
kddcup09_appetency, auc, 0.80447, 0.78778, **0.82912**, 0.82556
kr-vs-kp, auc, 0.99886, 0.99925, 0.99972, **0.99976**
miniboone, auc, 0.98217, 0.98102, nan, **0.98346**
nomao, auc, 0.99483, 0.99420, **0.99600**, 0.99538
numerai28_6, auc, 0.51655, 0.52161, **0.53052**, nan
phoneme, auc, 0.96542, 0.96448, 0.96751, **0.97070**
riccardo, auc, **0.99970**, 0.99794, nan, nan
sylvine, auc, 0.98470, 0.98496, 0.98936, **0.99339**
car, neg_logloss, -0.11659, -0.08885, **-0.00347**, -0.64257
cnae-9, neg_logloss, -0.33208, -0.27010, -0.21849, **-0.15369**
connect-4, neg_logloss, -0.50157, -0.47033, **-0.33770**, -0.37349
covertype, neg_logloss, **-0.07140**, -0.14096, -0.26422, nan
dilbert, neg_logloss, -0.14967, -0.24455, **-0.07643**, -0.16839
dionis, neg_logloss, **-2.15760**, nan, nan, nan
fabert, neg_logloss, -0.78781, -0.90152, **-0.77194**, -0.89159
fashion-mnist, neg_logloss, **-0.33257**, -0.38379, -0.38328, -0.53549
helena, neg_logloss, **-2.78497**, -6.34863, -2.98020, -2.98157
jannis, neg_logloss, -0.72838, -0.76192, **-0.69123**, -0.70310
jungle_chess, neg_logloss, -0.43064, -0.27074, -0.23952, **-0.21872**
mfeat-factors, neg_logloss, -0.16118, -0.17412, **-0.09296**, -0.10726
robert, neg_logloss, **-1.68431**, -1.74509, nan, nan
segment, neg_logloss, -0.09419, -0.09643, **-0.05962**, -0.07711
shuttle, neg_logloss, -0.00081, -0.00101, **-0.00036**, nan
vehicle, neg_logloss, -0.51546, -0.42776, **-0.33137**, -0.39150
volkert, neg_logloss, **-0.92007**, -1.04485, -0.97797, nan

The statistical analysis was conducted using the Friedman t-test.
The results of experiments and analysis confirm that FEDOT results are statistically indistinguishable
from SOTA competitors H2O, AutoGluon and TPOT (see below).

.. image:: img_benchmarks/stats.png
:header: Dataset,FEDOT,AutoGluon,H2O,TPOT

adult,0.874,0.874,0.875,0.874
airlines,0.669,0.669,0.675,0.617
airlinescodrnaadult,0.812,-,0.818,0.809
albert,0.670,0.669,0.697,0.667
amazon_employee_access,0.949,0.947,0.951,0.953
apsfailure,0.994,0.994,0.995,0.995
australian,0.871,0.870,0.865,0.860
bank-marketing,0.910,0.910,0.910,0.899
blood-transfusion,0.747,0.697,0.797,0.746
car,1.000,1.000,0.998,0.998
christine,0.746,0.746,0.748,0.737
click_prediction_small,0.835,0.835,0.777,0.777
cnae-9,0.957,0.954,0.957,0.954
connect-4,0.792,0.788,0.865,0.867
covertype,0.964,0.966,0.976,0.952
credit-g,0.753,0.759,0.766,0.727
dilbert,0.985,0.982,0.996,0.984
fabert,0.688,0.685,0.726,0.534
fashion-mnist,0.885,-,0.734,0.718
guillermo,0.821,-,0.915,0.897
helena,0.332,0.333,-,0.318
higgs,0.731,0.732,0.369,0.336
jannis,0.718,0.718,0.743,0.719
jasmine,0.817,0.821,0.734,0.727
jungle_chess_2pcs_raw_endgame_complete,0.953,0.939,0.817,0.817
kc1,0.866,0.867,0.996,0.947
kddcup09_appetency,0.982,0.982,0.866,0.818
kr-vs-kp,0.995,0.996,0.982,0.962
mfeat-factors,0.980,0.979,0.980,0.980
miniboone,0.948,0.948,0.952,0.949
nomao,0.969,0.970,0.975,0.974
numerai28_6,0.523,0.522,0.522,0.505
phoneme,0.915,0.916,0.916,0.910
riccardo,0.997,-,0.998,0.997
robert,0.405,-,0.559,0.487
segment,0.982,0.982,0.982,0.980
shuttle,1.000,1.000,1.000,1.000
sylvine,0.952,0.951,0.952,0.948
vehicle,0.851,0.849,0.846,0.835
volkert,0.694,0.694,0.758,0.697
Mean F1,0.838,0.837,0.833,0.812


Also, we tested FEDOT on the results of `AMLB <https://github.com/openml/automlbenchmark>`_ benchmark.
The visualization of FEDOT (v.0.7.3) results against H2O (3.46.0.4), AutoGluon (v.1.1.0), TPOT (v.0.12.1) and LightAutoML (v.0.3.7.3)
obtained using built-in visualizations of critial difference plot from AutoMLBenchmark are provided below:

All datasets (ROC AUC and negative log loss):

.. image:: ./img_benchmarks/cd-all-1h8c-constantpredictor.png

Binary classification (ROC AUC):

.. image:: ./img_benchmarks/cd-binary-classification-1h8c-constantpredictor.png

Multiclass classification (negative logloss):

.. image:: ./img_benchmarks/cd-multiclass-classification-1h8c-constantpredictor.png

We can claim that results are statistically better that TPOT and and indistinguishable from H2O and AutoGluon.

8 changes: 4 additions & 4 deletions docs/source/faq/abstract.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,16 @@ Abstract
data-driven composite models. It can solve classification, regression,
clustering, and forecasting problems.*

.. topic:: What FEDOT is framework.
.. topic:: Why FEDOT is framework?

*While the exact difference between 'library' and 'framework' is a bit ambiguous and
context-dependent in many cases, we still consider FEDOT as a framework.*

*The reason is that is can be used not only to solve pre-defined AutoML task,
but also can be used to build new derivative solutions.
*As an examples:* `FEDOT.NAS`_, `FEDOT.Industrial`_.
As an examples:* `FEDOT.NAS`_, `FEDOT.Industrial`_.

.. topic:: Why should I use FEDOT instead of existing state-of-the-art solutions (H2O/TPOT/etc)?
.. topic:: Why should I use FEDOT instead of existing state-of-the-art solutions (LightAutoML/AutoGluon/H2O/etc)?

*In practice, the existing AutoML solutions are really effective for the
limited set of problems only. During the model learning, modern AutoML
Expand All @@ -25,7 +25,7 @@ Abstract
set of models (this approach is also referred to as the Combined
Algorithm Selection and Hyperparameters optimization - CASH) since the
overall learning and meta-learning process is extremely expensive. In
the Fedot we have used the composite models concept. We claim,
the FEDOT we have used the composite models concept. We claim,
that it allows us to solve many actual real-world problems in a more
efficient way. Also, we are aimed to outperform the existing solutions
even for well-known benchmarks (e.g. PMLB datasets).*
Expand Down

0 comments on commit 5be9119

Please sign in to comment.