API refactoring (#67)

* API and core refactoring
aimclub · Apr 19, 2023 · 8f2da84 · 8f2da84
1 parent beb94bc
commit 8f2da84
Show file tree

Hide file tree

Showing 239 changed files with 12,996 additions and 5,090 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 
+.DS_Store
+
 # IntelliJ project files
 /.idea
 

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,2 @@
+include fedot_ind/core/repository/data/*
+include fedot_ind/core/architecture/postprocessing/*
diff --git a/README.rst b/README.rst
@@ -58,7 +58,7 @@
 
 Для этой цели мы предоставляем четыре генератора признаков:
 
-.. image:: /docs/img/all-generators-rus.png
+.. image:: /docs/img/all-generators.png
     :width: 700px
     :align: center
     :alt: All generators RUS
@@ -82,34 +82,44 @@ FEDOT.Industrial предоставляет высокоуровневый API,
 Классификация
 _____________
 
-Чтобы провести классификацию временных рядов, необходимо задать конфигурацию эксперимента в виде
-словаря, затем создать экземпляр класса `Industrial` и вызвать его метод `run_experiment`:
+Чтобы выполнить эксперимент по классификации временных рядов, необходимо инициализировать экземпляр класса ``FedotIndustrial``,
+и передать ему ряд именованных аргументов:
 
 .. code-block:: python
 
-    from core.api.API import Industrial
+    from core.api.main import FedotIndustrial
 
-    if __name__ == '__main__':
-        config = {'feature_generator': ['spectral', 'wavelet'],
-                  'datasets_list': ['UMD', 'Lightning7'],
-                  'use_cache': True,
-                  'error_correction': False,
-                  'launches': 3,
-                  'timeout': 15}
+    industrial = FedotIndustrial(task='ts_classification',
+                                 dataset='ItalyPowerDemand,
+                                 strategy='statistical',
+                                 use_cache=True,
+                                 timeout=15,
+                                 n_jobs=4,
+                                 window_sizes='auto',
+                                 logging_level=20,
+                                 output_folder=None)
 
-        ExperimentHelper = Industrial()
-        ExperimentHelper.run_experiment(config)
+Затем можно загрузить данные и запустить эксперимент:
 
+.. code-block:: python
+
+    train_data, test_data, _ = industrial.reader.read(dataset_name='ItalyPowerDemand')
+
+    model = industrial.fit(train_features=train_data[0], train_target=train_data[1])
+    labels = industrial.predict(test_features=test_data[0])
+    metric = industrial.get_metrics(target=test_data[1], metric_names=['f1', 'roc_auc'])
 
 В конфигурации содержатся следующие параметры:
 
-- ``feature_generator`` - список генераторов признаков для использования в эксперименте
-- ``use_cache`` - флаг для использования кеширования
-- ``datasets_list`` - список наборов данных для использования в эксперименте
-- ``launches`` - количество за пусков для каждого набора данных
-- ``error_correction`` - флаг для применения модели исправления ошибок в эксперименте
-- ``n_ecm_cycles`` - количество циклов для модели исправления ошибок
+- ``task`` – тип решаемой задачи (``ts_classification``)
+- ``dataset`` – имя набора данных для эксперимента
+- ``strategy`` – способ решения задачи: конкретный генератор или в режиме ``fedot_preset``
+- ``use_cache`` - флаг для использования кеширования извлечённых признаков
 - ``timeout`` - максимальное количество времени для составления пайплайна для классификации
+- ``n_jobs`` - количество процессов для параллельного выполнения
+- ``window_sizes`` - размеры окон для оконных генераторов
+- ``logging_level`` - уровень логирования
+- ``output_folder`` - путь к папке для сохранения результатов
 
 Наборы данных для классификации должны храниться в каталоге ``data`` и
 разделяться на наборы ``train`` и ``test``  с расширением ``.tsv``. Таким образом, имя папки
@@ -118,44 +128,15 @@ _____________
 класс ``Data Loader`` попытается загрузить данные из `архива UCR`_.
 
 Генераторы признаков, которые могут быть указаны в конфигурации:
-``window_quantile``, ``quantile``, ``spectral_window``, ``spectral``,
-``wavelet``, ``recurrence`` и ``topological``.
+``quantile``, ``wavelet``, ``recurrence`` и ``topological``.
 
 Также можно объединить несколько генераторов признаков.
 Для этого в конфигурации, где задаётся их список,
-необходимо присвоить полю ``feature_generator`` следующее значение:
+необходимо присвоить полю ``strategy`` следующее значение:
 
 .. code-block:: python
 
-    'ensemble: topological wavelet window_quantile quantile spectral spectral_window'
-
-Результаты эксперимента, которые включают сгенерированные признаки, предсказанные классы, метрики и
-пайплайны, хранятся в каталоге ``results_of_experiments/{feature_generator_name}``.
-Логи экспериментов хранятся в каталоге ``log``.
-
-Модель исправления ошибок
-+++++++++++++++++++++++++
-
-Использование модели исправления ошибок опционально. Чтобы применить её,
-необходимо установить значение ``True`` для флага ``error_correction``.
-По умолчанию количество циклов равно трём ``n_ecm_cycles=3``, но, используя для настройки экспериментов
-конфигурационный файл ``YAML``, можно легко изменить этот параметр.
-В этом случае после каждого запуска алгоритмического ядра FEDOT модель исправления ошибок будет обучаться на
-полученной ошибке.
-
-.. image:: /docs/img/error_corr_model-rus.png
-    :width: 900px
-    :align: center
-    :alt: Error correction model
-
-Модель для исправления ошибок основана на линейной регрессии и состоит из
-трёх этапов: на каждом следующем этапе модель усваивает ошибку
-прогнозирования. Этот тип групповой модели для исправления ошибок зависит
-от количества классов:
-
-- Для ``бинарной классификации`` модель представляет собой линейную регрессию,
-  обученную на предсказаниях этапов коррекции.
-- Для ``многоклассовой классификации`` модель представляет собой сумму предыдущих прогнозов.
+    'ensemble: topological wavelet quantile'
 
 Кеширование признаков
 +++++++++++++++++++++
@@ -164,7 +145,7 @@ _____________
 Если у флага ``use_cache`` в конфигурации установлено значение ``True``,
 то каждое пространство признаков, сгенерированное во время эксперимента,
 кэшируется в соответствующую папку. Для этого вычисляется хэш на основе аргументов
-функции ``get_features`` и атрибутов генератора. Затем полученное пространство признаков
+функции извлечения признаков и атрибутов генератора. Затем полученное пространство признаков
 записывается на диск с помощью библиотеки ``pickle``.
 
 В следующий раз, когда будет запрашиваеться то же пространство объектов, хэш вычисляется снова и
@@ -178,9 +159,10 @@ _____________
 
 В репозиторий включены следующие каталоги:
 
-- В папке ``core`` содержатся основные классы и скрипты
-- В папке ``cases`` содержится несколько примеров использования, которые помогают разобраться, как начать работать с фреймворком
-- Все интеграционные и юнит тесты находятся в папке ``test``
+- В папке ``api`` содержатся основные классы и скрипты интерфейса
+- В папке ``core`` содержатся основные алгоритмы и модели
+- В папке ``examples`` содержится несколько примеров использования, которые помогают разобраться, как начать работать с фреймворком
+- Все интеграционные и юнит-тесты находятся в папке ``test``
 - Исходники документации находятся в папке ``docs``
 
 Текущие исследования/разработки и планы на будущее
@@ -228,4 +210,4 @@ _____________
 .. _AutoML фреймворка FEDOT: https://gitlab.actcognitive.org/aimclub/FEDOT
 .. _архива UCR: https://www.cs.ucr.edu/~eamonn/time_series_data/
 .. _main: https://gitlab.actcognitive.org/aimclub/FEDOT-Industrial
-.. _readthedocs: https://fedotindustrial.readthedocs.io/en/latest/
+.. _readthedocs: https://fedotindustrial.readthedocs.io/en/latest/
diff --git a/README_en.rst b/README_en.rst
@@ -90,29 +90,41 @@ then create an instance of the ``Industrial`` class, and call its ``run_experime
 
 .. code-block:: python
 
-    from core.api.API import Industrial
+    from core.api.main import FedotIndustrial
 
-    if __name__ == '__main__':
-        config = {'feature_generator': ['spectral', 'wavelet'],
-                  'datasets_list': ['UMD', 'Lightning7'],
-                  'use_cache': True,
-                  'error_correction': False,
-                  'launches': 3,
-                  'timeout': 15}
+    industrial = FedotIndustrial(task='ts_classification',
+                                 dataset=dataset_name,
+                                 strategy='statistical',
+                                 use_cache=True,
+                                 timeout=1,
+                                 n_jobs=2,
+                                 window_sizes='auto',
+                                 logging_level=20,
+                                 output_folder=None)
 
-        ExperimentHelper = Industrial()
-        ExperimentHelper.run_experiment(config)
+You can then load the data and run the experiment:
+
+.. code-block:: python
+
+    train_data, test_data, _ = industrial.reader.read(dataset_name='ItalyPowerDemand')
+
+    model = industrial.fit(train_features=train_data[0], train_target=train_data[1])
+    labels = industrial.predict(test_features=test_data[0])
+    metric = industrial.get_metrics(target=test_data[1], metric_names=['f1', 'roc_auc'])
 
 
 The config contains the following parameters:
 
-- ``feature_generator`` - list of feature generators to use in the experiment
-- ``use_cache`` - whether to use cache or not
-- ``datasets_list`` - list of datasets to use in the experiment
-- ``launches`` - number of launches for each dataset
-- ``error_correction`` - flag to apply the error correction model in the experiment
-- ``n_ecm_cycles`` - number of cycles for the error correction model
-- ``timeout`` - the maximum amount of time for classification pipeline composition
+- ``task`` - type of task to be solved (``ts_classification``)
+- ``dataset`` - name of the data set for the experiment
+- ``strategy`` - the way to solve the problem: a specific generator or in ``fedot_preset`` mode
+- ``use_cache`` - a flag to use caching of extracted features
+- ``timeout`` - maximum amount of time to compile a pipeline for the classification
+- ``n_jobs`` - number of processes for parallel execution
+- ``window_sizes`` - window sizes for window generators
+- ``logging_level`` - logging level
+- ``output_folder`` - path to folder to save results
+
 
 Datasets for classification should be stored in the ``data`` directory and
 divided into ``train`` and ``test`` sets with ``.tsv`` extension. So the folder name
@@ -121,51 +133,23 @@ to use in the experiment. In case there is no data in the local folder, the ``Da
 class will try to load data from the `UCR archive`_.
 
 Possible feature generators which could be specified in the configuration are
-``window_quantile``, ``quantile``, ``spectral_window``, ``spectral``,
-``wavelet``, ``recurrence`` and ``topological``.
+``quantile``, ``wavelet``, ``recurrence`` и ``topological``.
 
 It is also possible to ensemble several feature generators.
-It could be done by setting the ``feature_generator`` field of the config, where
+It could be done by setting the ``strategy`` field of the config, where
 you need to specify the list of feature generators, to the following value:
 
 .. code-block:: python
 
-    'ensemble: topological wavelet window_quantile quantile spectral spectral_window'
-
-The experiment results which include generated features, predicted classes, metrics and
-pipelines are stored in the ``results_of_experiments/{feature_generator name}`` directory.
-The experiment logs are stored in the ``log`` directory.
-
-Error correction model
-++++++++++++++++++++++
-
-It is up to you to decide whether to use the error correction model or not. To apply it, the ``error_correction``
-flag in the config should be set to ``True``. By default the number of
-cycles ``n_ecm_cycles=3``, but using an advanced technique of experiment managing through a ``YAML`` config file
-you can easily adjust it.
-In this case after each launch of teh FEDOT algorithmic kernel the error correction model will be trained on the
-produced error.
-
-.. image:: /docs/img/error_corr_model.png
-    :width: 900px
-    :align: center
-    :alt: Error correction model
+    'ensemble: topological wavelet quantile'
 
-The error correction model is a linear regression model consisting of
-three stages: at every next stage the model learns the error of
-prediction. This type of ensemble model for error correction is dependent
-on a number of classes:
-- For ``binary classification`` the ensemble is also
-linear regression, trained on predictions of correction stages.
-- For ``multiclass classification`` the ensemble is a sum of previous predictions.
 
 Feature caching
 +++++++++++++++
 
 To speed up the experiment, you can cache the features produced by the feature generators.
 If ``use_cache`` bool flag in config is ``True``, then every feature space generated during the experiment is
-cached into the corresponding folder. To do so a hash from the function ``get_features`` arguments and the generator attributes
-is obtained. Then the resulting feature space is dumped via the ``pickle`` library.
+cached into the corresponding folder.
 
 The next time when the same feature space is requested, the hash is calculated again and the corresponding
 feature space is loaded from the cache which is much faster than generating it from scratch.
@@ -181,8 +165,9 @@ branch`_.
 
 The repository includes the following directories:
 
+- The ``api`` folder contains the main interface classes and scripts
 - Package ``core`` contains the main classes and scripts
-- Package ``cases`` includes several how-to-use-cases where you can start to discover how the framework works
+- Package ``examples`` includes several how-to-use-cases where you can start to discover how the framework works
 - All unit and integration tests are in the ``test`` directory
 - The sources of the documentation are in ``docs``
 
@@ -230,4 +215,4 @@ are published.
 .. _AutoML framework FEDOT: https://github.com/aimclub/FEDOT
 .. _UCR archive: https://www.cs.ucr.edu/~eamonn/time_series_data/
 .. _main branch: https://github.com/aimclub/Fedot.Industrial
-.. _readthedocs: https://fedotindustrial.readthedocs.io/en/latest/
+.. _readthedocs: https://fedotindustrial.readthedocs.io/en/latest/
diff --git a/core/__init__.py → benchmark/__init__.py b/core/__init__.py → benchmark/__init__.py
diff --git a/benchmark/abstract_bench.py b/benchmark/abstract_bench.py
@@ -0,0 +1,60 @@
+import logging
+import os
+
+
+class AbstractBenchmark(object):
+    """Abstract class for benchmarks.
+
+    This class defines the interface that all benchmarks must implement.
+    """
+
+    def __init__(self, output_dir, **kwargs):
+        """Initialize the benchmark.
+
+        Args:
+            name: The name of the benchmark.
+            description: A short description of the benchmark.
+            **kwargs: Additional arguments that may be required by the
+                benchmark.
+        """
+        self.output_dir = output_dir
+        self.kwargs = kwargs
+        self.logger = logging.getLogger(self.__class__.__name__)
+        self._create_output_dir()
+
+    @property
+    def _config(self):
+        raise NotImplementedError()
+
+    def _create_output_dir(self):
+        os.makedirs(self.output_dir, exist_ok=True)
+
+    def _create_report(self, results):
+        """Create a report from the results of the benchmark.
+
+        Args:
+            results: The results of the benchmark.
+
+        Returns:
+            A string containing the report.
+        """
+        raise NotImplementedError()
+
+    def run(self):
+        """Run the benchmark and return the results.
+
+        Returns:
+            A dictionary containing the results of the benchmark.
+        """
+        raise NotImplementedError()
+
+    def collect_results(self, output_dir):
+        """Collect the results of the benchmark.
+
+        Args:
+            output_dir: The directory where the benchmark wrote its results.
+
+        Returns:
+            A dictionary containing the results of the benchmark.
+        """
+        raise NotImplementedError()
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		include fedot_ind/core/repository/data/*
		include fedot_ind/core/architecture/postprocessing/*