Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomized model operation #1227

Closed
wants to merge 30 commits into from
Closed

Atomized model operation #1227

wants to merge 30 commits into from

Conversation

kasyanovse
Copy link
Collaborator

@kasyanovse kasyanovse commented Dec 15, 2023

Закрыт в связи с неактуальностью.

Linked:

План:

  • Композиция пайплайнов
  • Мутации с новыми операциями
    • Простые мутации
    • Репозиторий
    • Запретить обычным мутациям вставлять новые модели
    • Нормальные мутации
    • Учесть, что правила мутаций внутри и снаружи разные
    • Проверить что пайплайны мутируют корректно внутри atomized
      • Корректно определяются начальные и остальные узлы
      • Не вставляются операции с временными рядами
  • Кроссоверы с новыми операциями
  • Верификация пайплайнов с новыми операциями
    • Написать новые правила
    • Проверить что старые правила не конфликтуют с новыми пайплайнами
  • Тесты
  • Новые операции
  • Убрать AtomizedModel
  • Сделать TODO
  • Бенчмарк

Разбираемся почему плохо

Мутации

H0: Нет разницы в положении внутри поколения между пайплайнами с определенными мутациями
H1: ~H0

Выводы и результаты:

  1. Изменение окна связано с более высоким фитнесом внутри поколения.
  2. Мутации параметров, а особенно окна, вредят фитнесу если сделать автоматический подбор окна с помощью WindowSizeSelector из Topological forecaster #1186.
  3. Для добавления в мастер федота автоподбора окна создан PR add window size selector #1237
  4. Вроде бы новые операции делают пайплайны чуть лучше. Эффект небольшой, но статистически значимый (если стат тесты проведены корректно).
Код
# fedots is list with fitted Fedot objects
inds = [gen for gen in chain(*[fedot.history.generations for fedot in fedots]) if gen.generation_num != 0]
inds = [[x for x in gen if x.fitness.values[0] < 10 and x.parent_operator is not None] for gen in inds]
inds = [sorted(gen, key=lambda x: x.fitness.values[0]) for gen in inds]
t1 = dict()
ww, wow = list(), list()
strs = ('atomized', 'atomized_ts_differ')#, 'atomized_ts_to_time')
wstr, wostr = {k: list() for k in strs}, {k: list() for k in strs}
windows = dict()
print('there are', len(list(chain(*inds))), 'individuals')
for gen in inds:
    for _i, ind in enumerate(gen):
        mutation_name = ind.parent_operator.operators[0]
        individual_value = _i / len(gen)

        if 'parameter' in mutation_name:
            old_lagged = tuple([node.parameters['window_size']
                                for node in ind.parent_operator.parent_individuals[0].graph.nodes
                                if node.name == 'lagged' and node.parameters and 'window_size' in node.parameters])
            new_lagged = tuple([node.parameters['window_size']
                                for node in ind.graph.nodes
                                if node.name == 'lagged' and node.parameters and 'window_size' in node.parameters])
            if ind.native_generation not in windows:
                windows[ind.native_generation] = list()
            if new_lagged:
                windows[ind.native_generation].append(np.mean(new_lagged))
            (wow if new_lagged == old_lagged else ww).append(individual_value)

        for key in wstr:
            (wstr if key in str(ind.graph) else wostr)[key].append(individual_value)

        if mutation_name not in t1:
            t1[mutation_name] = list()
        t1[mutation_name].append(individual_value)

stat = lambda x, y: np.mean(x) - np.mean(y)
def test(name, *args):
    pvalues = [scipy.stats.permutation_test(args, stat, n_resamples=1000).pvalue for _ in range(10)]
    return f"{name} | {stat(*args):.2f} | {np.mean(pvalues):.1%} [{np.min(pvalues):.1%}-{np.max(pvalues):.1%}]"


for key in wstr:
    print(test(key.upper(), wstr[key], wostr[key]))
if ww and wow:
    print(test('WINDOW MUTATION', ww, wow))
for mutation in t1:
    s1, s2 = t1[mutation], list(chain(*[t1[x] for x in t1 if x != mutation]))
    if len(s1) > 5 and len(s2) > 5:
        print(test(mutation, s1, s2))
Вывод

НАЧАЛЬНЫЙ ВЫВОД БЕЗ АВТОПОДБОРА ОКНА

Мутация Изменение квантили по метрике в поколении pvalue Значимая разница
WINDOW MUTATION -0.2 0.0% !!!!!!
insert_atomized_operation -0.2 0.0% !!!!!!
single_edge_mutation 0.2 0.0% !!!!!!
single_change_mutation -0.0 30.6%
single_add_mutation 0.0 65.1%
parameter_change_mutation 0.1 0.0% !!!!!!
single_drop_mutation -0.0 77.0%

С АВТОПОДБОРОМ ОКНА

Название мутации или типа модели Изменение квантили по метрике в поколении pvalue
ATOMIZED -0.01 44.6% [41.0%-48.2%]
WINDOW MUTATION 0.12 2.6% [1.6%-4.0%]
insert_atomized_operation 0.01 59.0% [51.3%-65.1%]
parameter_change_mutation 0.03 11.2% [9.6%-12.8%]
single_add_mutation -0.05 13.2% [11.0%-16.0%]
single_change_mutation -0.10 0.2% [0.2%-0.2%]
single_drop_mutation 0.05 9.7% [7.6%-11.4%]
single_edge_mutation 0.14 0.2% [0.2%-0.2%]

С АВТОПОДБОРОМ ОКНА И БЕЗ НЕПОЛЕЗНЫХ МУТАЦИЙ

Название мутации или типа модели Изменение квантили по метрике в поколении pvalue
ATOMIZED -0.04 0.2% [0.2%-0.2%]
insert_atomized_operation -0.02 4.0% [3.4%-4.4%]
single_drop_mutation 0.05 0.3% [0.2%-0.4%]
single_add_mutation -0.03 1.6% [1.2%-2.2%]
single_change_mutation 0.01 22.0% [18.6%-26.2%]

Сравнение метрик пайплайнов с atomized и nonatomized

Производится единственный запуск Fedot без фиксации seed на заданных данных.
Считаются метрики всех моделей в процессе оптимизации в контексте одного поколения, значения метрик разделяются на группу метрик от atomized и группу метрик остальных моделей.

H0: распределения atomized и non-atomized метрик в одном поколении имеют одинаковые матожидания
H1: ~H0

Вывод: H0 отвергается, на одном поколении большой мощности матожидания метрик не совпадают для atomized и non-atomized.

Листинг кода стат тестов
import logging
from random import random
from itertools import chain
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import f_oneway, ttest_ind, alexandergovern
from statistics import mean

from fedot.api.main import Fedot
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
from fedot.core.data.data import InputData
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.data.data_split import train_test_data_setup


RANDOM_SEED = 100
NUM_EXPERIMENTS = 10
ALPHA = 0.05


def get_data(data_length: int = 500, test_length: int = 100) -> InputData:
    garmonics = [(0.1, 0.9), (0.1, 1), (0.1, 1.1), (0.05, 2), (0.05, 5), (1, 0.02)]
    for _ in range(5):
        garmonics += [(random() * 0.1 + 0.1, random() * 2)]
    time = np.linspace(0, 100, data_length)
    data = time * 0
    for g in garmonics:
        data += g[0] * np.sin(g[1] * 2 * np.pi / time[-1] * 25 * time)
    data += time * 0.1
    data = InputData(idx=np.arange(0, data.shape[0]),
                     features=data,
                     target=data,
                     task=Task(TaskTypesEnum.ts_forecasting,
                               TsForecastingParams(forecast_length=test_length)),
                     data_type=DataTypesEnum.ts)
    return train_test_data_setup(data,
                                 validation_blocks=1,
                                 split_ratio=(data_length - test_length) / ((data_length - test_length) + test_length))


def get_fitted_fedot(train: InputData, test: InputData, random_seed: int = RANDOM_SEED) -> Fedot:
    initial_assumption = None

    fedot = Fedot(problem='ts_forecasting',
                  task_params=TsForecastingParams(forecast_length=test.idx.shape[0]),
                  logging_level=logging.WARNING,
                  timeout=5,
                  pop_size=20,
                  num_of_generations=3,
                  n_jobs=10,
                  with_tuning=False,
                  initial_assumption=initial_assumption,
                  )
    fedot.fit(train)
    return fedot


if __name__ == '__main__':
    train, test = get_data()
    fedot = get_fitted_fedot(train, test)

    for population in fedot.history.generations:
        atomized_metrics, nonatomized_metrics = [], []
        for individual in population:
            if individual.fitness.value < 1:
                if 'atomized' in individual.graph.descriptive_id:
                    atomized_metrics.append(individual.fitness.value)
                else:
                    nonatomized_metrics.append(individual.fitness.value)

        if len(atomized_metrics) and len(nonatomized_metrics):
            _, p_anova = f_oneway(atomized_metrics, nonatomized_metrics)
            _, p_ttest = ttest_ind(atomized_metrics, nonatomized_metrics)
            p_agovern = alexandergovern(atomized_metrics, nonatomized_metrics).pvalue
            print(f'\nAtomized metrics length: {len(atomized_metrics)}')
            print(f'Atomized metrics mean: {mean(atomized_metrics)}\n')
            print(f'Non-Atomized metrics length: {len(nonatomized_metrics)}')
            print(f'Non-Atomized metrics mean: {mean(nonatomized_metrics)}\n')
            print(f'ALEXANDERGOVERN: H0 {p_agovern > ALPHA} (p-value: {p_agovern})')
            print(f'ANOVA: H0 {p_anova > ALPHA} (p-value: {p_anova})')
            print(f'TTEST: H0 {p_ttest > ALPHA} (p-value: {p_ttest})\n')
Результаты на одном запуске fedot в одном поколении
Atomized metrics length: 23
Atomized metrics mean: 0.3193077976971991

Non-Atomized metrics length: 40
Non-Atomized metrics mean: 0.42342097047587063

ALEXANDERGOVERN: H0 False (p-value: 0.0009336975906277874)
ANOVA: H0 False (p-value: 0.0035480628380782806)
TTEST: H0 False (p-value: 0.0035480628380782525)
Результаты на нескольких запусках fedot при фиксированных данных diff - абсолютная разница между nonatomized mean и atomized mean
atomized mean metric nonatomized mean metric diff
0.40310965163897533 0.5258254039784768 0.12271575233950144
0.38941925878670225 0.5029692174546525 0.11354995866795026
0.39899222822731417 0.4946715308181331 0.09567930259081892
0.3828210686266902 0.3825838994677389 -0.0002371691589512781
0.40596368379091746 0.4952406389814257 0.08927695519050821
0.3794686018324255 0.3911405440142768 0.011671942181851303
0.38953820082509855 0.49915707765067835 0.1096188768255798
0.38192409680048195 0.4793310851871796 0.09740698838669765
0.3824244353971997 0.35815678898201725 -0.024267646415182476
0.40142854838786285 0.5070242861220102 0.10559573773414738
0.3655888254449583 0.4032475104193803 0.037658684974421985
0.39817458028926306 0.5014504247826739 0.10327584449341082
0.3848995605685277 0.44268693656915314 0.057787376000625446
0.4030127105036263 0.5233831563043596 0.1203704458007333
0.3926023945342515 0.4266703314578744 0.03406793692362292
0.38813247164237114 0.49407357595967194 0.1059411043173008
0.3702466107812836 0.3582309620735185 -0.012015648707765114

@kasyanovse kasyanovse added the enhancement New feature or request label Dec 15, 2023
@kasyanovse kasyanovse self-assigned this Dec 15, 2023
@pep8speaks
Copy link

pep8speaks commented Dec 15, 2023

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 15:34: F821 undefined name 'Pipeline'
Line 27:92: F821 undefined name 'Pipeline'
Line 33:35: F821 undefined name 'Pipeline'
Line 44:43: F821 undefined name 'Pipeline'
Line 51:36: F821 undefined name 'MetricCallable'
Line 75:76: F821 undefined name 'PipelineNode'

Line 5:1: F401 'fedot.core.operations.evaluation.evaluation_interfaces.SkLearnEvaluationStrategy' imported but unused
Line 6:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.decompose.DecomposerRegImplementation' imported but unused
Line 8:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.IsolationForestRegImplementation' imported but unused
Line 10:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.LinearRegRANSACImplementation' imported but unused
Line 10:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.NonLinearRegRANSACImplementation' imported but unused
Line 12:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_selectors.LinearRegFSImplementation' imported but unused
Line 12:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_selectors.NonLinearRegFSImplementation' imported but unused
Line 22:1: F401 'fedot.core.operations.evaluation.operation_implementations.models.knn.FedotKnnRegImplementation' imported but unused
Line 24:1: F401 'fedot.utilities.random.ImplementationRandomStateHandler' imported but unused

Line 1:1: F401 'typing.Union' imported but unused
Line 1:1: F401 'typing.Any' imported but unused
Line 1:1: F401 'typing.Dict' imported but unused
Line 6:1: F401 'fedot.core.operations.atomized_model.atomized_model.AtomizedModel' imported but unused
Line 9:1: F401 'fedot.core.operations.operation_parameters.OperationParameters' imported but unused
Line 12:1: F401 'fedot.core.pipelines.pipeline_node_factory.PipelineOptNodeFactory' imported but unused
Line 13:1: F401 'fedot.core.pipelines.random_pipeline_factory.RandomPipelineFactory' imported but unused
Line 14:1: F401 'fedot.core.repository.pipeline_operation_repository.PipelineOperationRepository' imported but unused
Line 15:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 15:1: F401 'fedot.core.repository.tasks.Task' imported but unused

Line 14:53: W292 no newline at end of file

Line 10:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.Task' imported but unused
Line 37:21: F841 local variable 'target' is assigned to but never used

Line 3:1: F401 'numpy as np' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.Task' imported but unused
Line 56:39: E127 continuation line over-indented for visual indent
Line 57:39: E127 continuation line over-indented for visual indent
Line 58:39: E127 continuation line over-indented for visual indent

Line 1:1: F401 'typing.Union' imported but unused
Line 1:1: F401 'typing.Any' imported but unused
Line 1:1: F401 'typing.Dict' imported but unused
Line 6:1: F401 'fedot.core.operations.atomized_model.atomized_model.AtomizedModel' imported but unused
Line 9:1: F401 'fedot.core.operations.operation_parameters.OperationParameters' imported but unused
Line 12:1: F401 'fedot.core.pipelines.pipeline_node_factory.PipelineOptNodeFactory' imported but unused
Line 13:1: F401 'fedot.core.pipelines.random_pipeline_factory.RandomPipelineFactory' imported but unused
Line 15:1: F401 'fedot.core.repository.pipeline_operation_repository.PipelineOperationRepository' imported but unused
Line 16:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 32:78: E231 missing whitespace after ','

Line 6:1: F401 'fedot.core.repository.operation_types_repository.get_operation_type_from_id' imported but unused

Line 36:30: F541 f-string is missing placeholders

Line 5:1: F401 'typing.Callable' imported but unused
Line 9:1: F401 'fedot.core.pipelines.node.PipelineNode' imported but unused
Line 10:1: F401 'fedot.core.pipelines.pipeline.Pipeline' imported but unused
Line 11:1: F401 'fedot.core.repository.operation_types_repository.OperationTypesRepository' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_edge_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_add_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_change_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_drop_mutation' imported but unused
Line 18:1: F401 'golem.core.optimisers.optimization_parameters.GraphRequirements' imported but unused
Line 19:1: F401 'golem.core.optimisers.optimizer.GraphGenerationParams' imported but unused
Line 20:1: F401 'golem.core.optimisers.genetic.gp_params.GPAlgorithmParameters' imported but unused
Line 33:121: E501 line too long (125 > 120 characters)

Line 5:1: F401 'typing.Union' imported but unused
Line 55:1: E303 too many blank lines (3)
Line 83:5: E129 visually indented line with same indent as next logical line

Line 319:13: F841 local variable 'ex' is assigned to but never used

Line 4:1: F401 'itertools.chain' imported but unused

Line 9:1: F401 'fedot.core.composer.metrics.RMSE' imported but unused

Line 9:121: E501 line too long (130 > 120 characters)
Line 10:121: E501 line too long (132 > 120 characters)
Line 11:121: E501 line too long (130 > 120 characters)

Comment last updated at 2023-12-22 08:59:54 UTC

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 138 lines in your changes are missing coverage. Please review.

Comparison is base (299ffba) 79.47% compared to head (7c2e51a) 78.45%.
Report is 1 commits behind head on master.

❗ Current head 7c2e51a differs from pull request most recent head 713b1d5. Consider uploading reports for the commit 713b1d5 to get more accurate results

Files Patch % Lines
...e/operations/atomized_model/atomized_ts_sampler.py 0.00% 51 Missing ⚠️
...re/operations/atomized_model/atomized_ts_differ.py 0.00% 42 Missing ⚠️
...re/operations/atomized_model/atomized_ts_scaler.py 0.00% 39 Missing ⚠️
...edot/core/optimisers/genetic_operators/mutation.py 89.74% 4 Missing ⚠️
...t/core/operations/atomized_model/atomized_model.py 0.00% 1 Missing ⚠️
...ore/operations/atomized_model/atomized_template.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1227      +/-   ##
==========================================
- Coverage   79.47%   78.45%   -1.02%     
==========================================
  Files         145      149       +4     
  Lines        9928    10109     +181     
==========================================
+ Hits         7890     7931      +41     
- Misses       2038     2178     +140     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants