Atomized model operation #1227

kasyanovse · 2023-12-15T08:30:01Z

Закрыт в связи с неактуальностью.

Linked:

Add node factory to graph GOLEM#255

План:

Разбираемся почему плохо

Мутации

H0: Нет разницы в положении внутри поколения между пайплайнами с определенными мутациями
H1: ~H0

Выводы и результаты:

Изменение окна связано с более высоким фитнесом внутри поколения.
Мутации параметров, а особенно окна, вредят фитнесу если сделать автоматический подбор окна с помощью WindowSizeSelector из Topological forecaster #1186.
Для добавления в мастер федота автоподбора окна создан PR add window size selector #1237
Вроде бы новые операции делают пайплайны чуть лучше. Эффект небольшой, но статистически значимый (если стат тесты проведены корректно).

Код

# fedots is list with fitted Fedot objects
inds = [gen for gen in chain(*[fedot.history.generations for fedot in fedots]) if gen.generation_num != 0]
inds = [[x for x in gen if x.fitness.values[0] < 10 and x.parent_operator is not None] for gen in inds]
inds = [sorted(gen, key=lambda x: x.fitness.values[0]) for gen in inds]
t1 = dict()
ww, wow = list(), list()
strs = ('atomized', 'atomized_ts_differ')#, 'atomized_ts_to_time')
wstr, wostr = {k: list() for k in strs}, {k: list() for k in strs}
windows = dict()
print('there are', len(list(chain(*inds))), 'individuals')
for gen in inds:
    for _i, ind in enumerate(gen):
        mutation_name = ind.parent_operator.operators[0]
        individual_value = _i / len(gen)

        if 'parameter' in mutation_name:
            old_lagged = tuple([node.parameters['window_size']
                                for node in ind.parent_operator.parent_individuals[0].graph.nodes
                                if node.name == 'lagged' and node.parameters and 'window_size' in node.parameters])
            new_lagged = tuple([node.parameters['window_size']
                                for node in ind.graph.nodes
                                if node.name == 'lagged' and node.parameters and 'window_size' in node.parameters])
            if ind.native_generation not in windows:
                windows[ind.native_generation] = list()
            if new_lagged:
                windows[ind.native_generation].append(np.mean(new_lagged))
            (wow if new_lagged == old_lagged else ww).append(individual_value)

        for key in wstr:
            (wstr if key in str(ind.graph) else wostr)[key].append(individual_value)

        if mutation_name not in t1:
            t1[mutation_name] = list()
        t1[mutation_name].append(individual_value)

stat = lambda x, y: np.mean(x) - np.mean(y)
def test(name, *args):
    pvalues = [scipy.stats.permutation_test(args, stat, n_resamples=1000).pvalue for _ in range(10)]
    return f"{name} | {stat(*args):.2f} | {np.mean(pvalues):.1%} [{np.min(pvalues):.1%}-{np.max(pvalues):.1%}]"


for key in wstr:
    print(test(key.upper(), wstr[key], wostr[key]))
if ww and wow:
    print(test('WINDOW MUTATION', ww, wow))
for mutation in t1:
    s1, s2 = t1[mutation], list(chain(*[t1[x] for x in t1 if x != mutation]))
    if len(s1) > 5 and len(s2) > 5:
        print(test(mutation, s1, s2))

Вывод

НАЧАЛЬНЫЙ ВЫВОД БЕЗ АВТОПОДБОРА ОКНА

Мутация	Изменение квантили по метрике в поколении	pvalue	Значимая разница
WINDOW MUTATION	-0.2	0.0%	!!!!!!
insert_atomized_operation	-0.2	0.0%	!!!!!!
single_edge_mutation	0.2	0.0%	!!!!!!
single_change_mutation	-0.0	30.6%
single_add_mutation	0.0	65.1%
parameter_change_mutation	0.1	0.0%	!!!!!!
single_drop_mutation	-0.0	77.0%

С АВТОПОДБОРОМ ОКНА

Название мутации или типа модели	Изменение квантили по метрике в поколении	pvalue
ATOMIZED	-0.01	44.6% [41.0%-48.2%]
WINDOW MUTATION	0.12	2.6% [1.6%-4.0%]
insert_atomized_operation	0.01	59.0% [51.3%-65.1%]
parameter_change_mutation	0.03	11.2% [9.6%-12.8%]
single_add_mutation	-0.05	13.2% [11.0%-16.0%]
single_change_mutation	-0.10	0.2% [0.2%-0.2%]
single_drop_mutation	0.05	9.7% [7.6%-11.4%]
single_edge_mutation	0.14	0.2% [0.2%-0.2%]

С АВТОПОДБОРОМ ОКНА И БЕЗ НЕПОЛЕЗНЫХ МУТАЦИЙ

Название мутации или типа модели	Изменение квантили по метрике в поколении	pvalue
ATOMIZED	-0.04	0.2% [0.2%-0.2%]
insert_atomized_operation	-0.02	4.0% [3.4%-4.4%]
single_drop_mutation	0.05	0.3% [0.2%-0.4%]
single_add_mutation	-0.03	1.6% [1.2%-2.2%]
single_change_mutation	0.01	22.0% [18.6%-26.2%]

Сравнение метрик пайплайнов с atomized и nonatomized

Производится единственный запуск Fedot без фиксации seed на заданных данных.
Считаются метрики всех моделей в процессе оптимизации в контексте одного поколения, значения метрик разделяются на группу метрик от atomized и группу метрик остальных моделей.

H0: распределения atomized и non-atomized метрик в одном поколении имеют одинаковые матожидания
H1: ~H0

Вывод: H0 отвергается, на одном поколении большой мощности матожидания метрик не совпадают для atomized и non-atomized.

Листинг кода стат тестов

import logging
from random import random
from itertools import chain
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import f_oneway, ttest_ind, alexandergovern
from statistics import mean

from fedot.api.main import Fedot
from fedot.core.pipelines.pipeline import Pipeline
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams
from fedot.core.data.data import InputData
from fedot.core.repository.dataset_types import DataTypesEnum
from fedot.core.data.data_split import train_test_data_setup


RANDOM_SEED = 100
NUM_EXPERIMENTS = 10
ALPHA = 0.05


def get_data(data_length: int = 500, test_length: int = 100) -> InputData:
    garmonics = [(0.1, 0.9), (0.1, 1), (0.1, 1.1), (0.05, 2), (0.05, 5), (1, 0.02)]
    for _ in range(5):
        garmonics += [(random() * 0.1 + 0.1, random() * 2)]
    time = np.linspace(0, 100, data_length)
    data = time * 0
    for g in garmonics:
        data += g[0] * np.sin(g[1] * 2 * np.pi / time[-1] * 25 * time)
    data += time * 0.1
    data = InputData(idx=np.arange(0, data.shape[0]),
                     features=data,
                     target=data,
                     task=Task(TaskTypesEnum.ts_forecasting,
                               TsForecastingParams(forecast_length=test_length)),
                     data_type=DataTypesEnum.ts)
    return train_test_data_setup(data,
                                 validation_blocks=1,
                                 split_ratio=(data_length - test_length) / ((data_length - test_length) + test_length))


def get_fitted_fedot(train: InputData, test: InputData, random_seed: int = RANDOM_SEED) -> Fedot:
    initial_assumption = None

    fedot = Fedot(problem='ts_forecasting',
                  task_params=TsForecastingParams(forecast_length=test.idx.shape[0]),
                  logging_level=logging.WARNING,
                  timeout=5,
                  pop_size=20,
                  num_of_generations=3,
                  n_jobs=10,
                  with_tuning=False,
                  initial_assumption=initial_assumption,
                  )
    fedot.fit(train)
    return fedot


if __name__ == '__main__':
    train, test = get_data()
    fedot = get_fitted_fedot(train, test)

    for population in fedot.history.generations:
        atomized_metrics, nonatomized_metrics = [], []
        for individual in population:
            if individual.fitness.value < 1:
                if 'atomized' in individual.graph.descriptive_id:
                    atomized_metrics.append(individual.fitness.value)
                else:
                    nonatomized_metrics.append(individual.fitness.value)

        if len(atomized_metrics) and len(nonatomized_metrics):
            _, p_anova = f_oneway(atomized_metrics, nonatomized_metrics)
            _, p_ttest = ttest_ind(atomized_metrics, nonatomized_metrics)
            p_agovern = alexandergovern(atomized_metrics, nonatomized_metrics).pvalue
            print(f'\nAtomized metrics length: {len(atomized_metrics)}')
            print(f'Atomized metrics mean: {mean(atomized_metrics)}\n')
            print(f'Non-Atomized metrics length: {len(nonatomized_metrics)}')
            print(f'Non-Atomized metrics mean: {mean(nonatomized_metrics)}\n')
            print(f'ALEXANDERGOVERN: H0 {p_agovern > ALPHA} (p-value: {p_agovern})')
            print(f'ANOVA: H0 {p_anova > ALPHA} (p-value: {p_anova})')
            print(f'TTEST: H0 {p_ttest > ALPHA} (p-value: {p_ttest})\n')

Результаты на одном запуске fedot в одном поколении

Atomized metrics length: 23
Atomized metrics mean: 0.3193077976971991

Non-Atomized metrics length: 40
Non-Atomized metrics mean: 0.42342097047587063

ALEXANDERGOVERN: H0 False (p-value: 0.0009336975906277874)
ANOVA: H0 False (p-value: 0.0035480628380782806)
TTEST: H0 False (p-value: 0.0035480628380782525)

Результаты на нескольких запусках fedot при фиксированных данных

diff - абсолютная разница между nonatomized mean и atomized mean

`atomized mean metric`	`nonatomized mean metric`	`diff`
0.40310965163897533	0.5258254039784768	0.12271575233950144
0.38941925878670225	0.5029692174546525	0.11354995866795026
0.39899222822731417	0.4946715308181331	0.09567930259081892
0.3828210686266902	0.3825838994677389	-0.0002371691589512781
0.40596368379091746	0.4952406389814257	0.08927695519050821
0.3794686018324255	0.3911405440142768	0.011671942181851303
0.38953820082509855	0.49915707765067835	0.1096188768255798
0.38192409680048195	0.4793310851871796	0.09740698838669765
0.3824244353971997	0.35815678898201725	-0.024267646415182476
0.40142854838786285	0.5070242861220102	0.10559573773414738
0.3655888254449583	0.4032475104193803	0.037658684974421985
0.39817458028926306	0.5014504247826739	0.10327584449341082
0.3848995605685277	0.44268693656915314	0.057787376000625446
0.4030127105036263	0.5233831563043596	0.1203704458007333
0.3926023945342515	0.4266703314578744	0.03406793692362292
0.38813247164237114	0.49407357595967194	0.1059411043173008
0.3702466107812836	0.3582309620735185	-0.012015648707765114

pep8speaks · 2023-12-15T08:30:17Z

Hello @kasyanovse! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file fedot/core/operations/atomized_model/atomized_model.py:

Line 15:34: F821 undefined name 'Pipeline'
Line 27:92: F821 undefined name 'Pipeline'
Line 33:35: F821 undefined name 'Pipeline'
Line 44:43: F821 undefined name 'Pipeline'
Line 51:36: F821 undefined name 'MetricCallable'
Line 75:76: F821 undefined name 'PipelineNode'

In the file fedot/core/operations/evaluation/atomized.py:

Line 5:1: F401 'fedot.core.operations.evaluation.evaluation_interfaces.SkLearnEvaluationStrategy' imported but unused
Line 6:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.decompose.DecomposerRegImplementation' imported but unused
Line 8:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.IsolationForestRegImplementation' imported but unused
Line 10:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.LinearRegRANSACImplementation' imported but unused
Line 10:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_filters.NonLinearRegRANSACImplementation' imported but unused
Line 12:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_selectors.LinearRegFSImplementation' imported but unused
Line 12:1: F401 'fedot.core.operations.evaluation.operation_implementations.data_operations.sklearn_selectors.NonLinearRegFSImplementation' imported but unused
Line 22:1: F401 'fedot.core.operations.evaluation.operation_implementations.models.knn.FedotKnnRegImplementation' imported but unused
Line 24:1: F401 'fedot.utilities.random.ImplementationRandomStateHandler' imported but unused

In the file fedot/core/operations/evaluation/operation_implementations/models/atomized/atomized_ts_differ.py:

Line 1:1: F401 'typing.Union' imported but unused
Line 1:1: F401 'typing.Any' imported but unused
Line 1:1: F401 'typing.Dict' imported but unused
Line 6:1: F401 'fedot.core.operations.atomized_model.atomized_model.AtomizedModel' imported but unused
Line 9:1: F401 'fedot.core.operations.operation_parameters.OperationParameters' imported but unused
Line 12:1: F401 'fedot.core.pipelines.pipeline_node_factory.PipelineOptNodeFactory' imported but unused
Line 13:1: F401 'fedot.core.pipelines.random_pipeline_factory.RandomPipelineFactory' imported but unused
Line 14:1: F401 'fedot.core.repository.pipeline_operation_repository.PipelineOperationRepository' imported but unused
Line 15:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 15:1: F401 'fedot.core.repository.tasks.Task' imported but unused

In the file fedot/core/operations/evaluation/operation_implementations/models/atomized/atomized_ts_mixins.py:

Line 14:53: W292 no newline at end of file

In the file fedot/core/operations/evaluation/operation_implementations/models/atomized/atomized_ts_sampler.py:

Line 10:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.Task' imported but unused
Line 37:21: F841 local variable 'target' is assigned to but never used

In the file fedot/core/operations/evaluation/operation_implementations/models/atomized/atomized_ts_scaler.py:

Line 3:1: F401 'numpy as np' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 10:1: F401 'fedot.core.repository.tasks.Task' imported but unused
Line 56:39: E127 continuation line over-indented for visual indent
Line 57:39: E127 continuation line over-indented for visual indent
Line 58:39: E127 continuation line over-indented for visual indent

In the file fedot/core/operations/evaluation/operation_implementations/models/atomized/atomized_ts_transform_to_time.py:

Line 1:1: F401 'typing.Union' imported but unused
Line 1:1: F401 'typing.Any' imported but unused
Line 1:1: F401 'typing.Dict' imported but unused
Line 6:1: F401 'fedot.core.operations.atomized_model.atomized_model.AtomizedModel' imported but unused
Line 9:1: F401 'fedot.core.operations.operation_parameters.OperationParameters' imported but unused
Line 12:1: F401 'fedot.core.pipelines.pipeline_node_factory.PipelineOptNodeFactory' imported but unused
Line 13:1: F401 'fedot.core.pipelines.random_pipeline_factory.RandomPipelineFactory' imported but unused
Line 15:1: F401 'fedot.core.repository.pipeline_operation_repository.PipelineOperationRepository' imported but unused
Line 16:1: F401 'fedot.core.repository.tasks.TsForecastingParams' imported but unused
Line 32:78: E231 missing whitespace after ','

In the file fedot/core/operations/factory.py:

Line 6:1: F401 'fedot.core.repository.operation_types_repository.get_operation_type_from_id' imported but unused

In the file fedot/core/optimisers/genetic_operators/atomized_operators_wrapper.py:

Line 36:30: F541 f-string is missing placeholders

In the file fedot/core/optimisers/genetic_operators/crossover.py:

Line 5:1: F401 'typing.Callable' imported but unused
Line 9:1: F401 'fedot.core.pipelines.node.PipelineNode' imported but unused
Line 10:1: F401 'fedot.core.pipelines.pipeline.Pipeline' imported but unused
Line 11:1: F401 'fedot.core.repository.operation_types_repository.OperationTypesRepository' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_edge_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_add_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_change_mutation' imported but unused
Line 13:1: F401 'golem.core.optimisers.genetic.operators.base_mutations.single_drop_mutation' imported but unused
Line 18:1: F401 'golem.core.optimisers.optimization_parameters.GraphRequirements' imported but unused
Line 19:1: F401 'golem.core.optimisers.optimizer.GraphGenerationParams' imported but unused
Line 20:1: F401 'golem.core.optimisers.genetic.gp_params.GPAlgorithmParameters' imported but unused
Line 33:121: E501 line too long (125 > 120 characters)

In the file fedot/core/optimisers/genetic_operators/mutation.py:

Line 5:1: F401 'typing.Union' imported but unused
Line 55:1: E303 too many blank lines (3)
Line 83:5: E129 visually indented line with same indent as next logical line

In the file fedot/core/pipelines/template.py:

Line 319:13: F841 local variable 'ex' is assigned to but never used

In the file test/integration/api/test_main_api.py:

Line 4:1: F401 'itertools.chain' imported but unused

In the file test/integration/models/atomized_models/test_atomized_model.py:

Line 9:1: F401 'fedot.core.composer.metrics.RMSE' imported but unused

In the file test/integration/models/atomized_models/test_atomized_ts_operations.py:

Line 9:121: E501 line too long (130 > 120 characters)
Line 10:121: E501 line too long (132 > 120 characters)
Line 11:121: E501 line too long (130 > 120 characters)

Comment last updated at 2023-12-22 08:59:54 UTC

codecov · 2023-12-15T08:36:25Z

Codecov Report

Attention: 138 lines in your changes are missing coverage. Please review.

Comparison is base (299ffba) 79.47% compared to head (7c2e51a) 78.45%.
Report is 1 commits behind head on master.

❗ Current head 7c2e51a differs from pull request most recent head 713b1d5. Consider uploading reports for the commit 713b1d5 to get more accurate results

Files	Patch %	Lines
...e/operations/atomized_model/atomized_ts_sampler.py	0.00%	51 Missing ⚠️
...re/operations/atomized_model/atomized_ts_differ.py	0.00%	42 Missing ⚠️
...re/operations/atomized_model/atomized_ts_scaler.py	0.00%	39 Missing ⚠️
...edot/core/optimisers/genetic_operators/mutation.py	89.74%	4 Missing ⚠️
...t/core/operations/atomized_model/atomized_model.py	0.00%	1 Missing ⚠️
...ore/operations/atomized_model/atomized_template.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1227      +/-   ##
==========================================
- Coverage   79.47%   78.45%   -1.02%     
==========================================
  Files         145      149       +4     
  Lines        9928    10109     +181     
==========================================
+ Hits         7890     7931      +41     
- Misses       2038     2178     +140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

kasyanovse added 5 commits December 14, 2023 16:21

move atomized to new folder

640fbce

series decompose model

d74e386

wip tests

0a85475

new model

37492bd

fixes

09f9828

kasyanovse added the enhancement New feature or request label Dec 15, 2023

kasyanovse self-assigned this Dec 15, 2023

kasyanovse added 11 commits December 15, 2023 13:32

delete ts decomposer

d6fdb34

add scaler

db18ed9

add diff operation

5d60648

wip, start work with mutations are adapted for atomized models

7c2e51a

pipeline adapter works with atomized models now

0ec6851

add mutations for atomized graphs

6bcdd7e

fix problem with functools.wraps

d4cf768

fix task type bug in atomized model

57be94f

Merge branch 'master' into atomized-model-operation

3cccc39

fix atomized mutation in fedot

3442a06

small fixes

1c489bf

kasyanovse mentioned this pull request Dec 18, 2023

Atomized model impovements #1232

Closed

kasyanovse added 5 commits December 19, 2023 10:36

fix tests

6645956

pep8

3363637

fix probability in test

a925074

intermediate save before create new model class

0cb15ec

create new operation type - atomized

0a5defd

kasyanovse mentioned this pull request Dec 21, 2023

Add node factory to graph aimclub/GOLEM#255

Closed

kasyanovse added 3 commits December 21, 2023 19:15

crossover

3756c63

new model

2dbf081

some adds for testing purpose

299af5f

small fix

7dfb639

kasyanovse assigned Lopa10ko Dec 26, 2023

add window size selector

d232831

kasyanovse mentioned this pull request Dec 27, 2023

add window size selector #1237

Merged

kasyanovse added 4 commits December 27, 2023 16:33

add decomposer

6d7a92e

fix atomized_ts_to_time model

d6f48bb

fix some errors, disable atomized_ts_decomposer

fcf9da5

small fix

713b1d5

kasyanovse closed this Mar 3, 2024

Lopa10ko mentioned this pull request Aug 23, 2024

fix: do not launch composition for atomized model #1324

Merged

DRMPN mentioned this pull request Aug 26, 2024

refactor: Implement new AtomizedModel #1325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomized model operation #1227

Atomized model operation #1227

kasyanovse commented Dec 15, 2023 •

edited

Loading

pep8speaks commented Dec 15, 2023 •

edited

Loading

codecov bot commented Dec 15, 2023 •

edited

Loading

Atomized model operation #1227

Atomized model operation #1227

Conversation

kasyanovse commented Dec 15, 2023 • edited Loading

Разбираемся почему плохо

Мутации

Сравнение метрик пайплайнов с atomized и nonatomized

pep8speaks commented Dec 15, 2023 • edited Loading

Comment last updated at 2023-12-22 08:59:54 UTC

codecov bot commented Dec 15, 2023 • edited Loading

Codecov Report

kasyanovse commented Dec 15, 2023 •

edited

Loading

pep8speaks commented Dec 15, 2023 •

edited

Loading

codecov bot commented Dec 15, 2023 •

edited

Loading