Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A functional that takes a custom function and applies it to Data #69

Open
vultor33 opened this issue May 24, 2019 · 3 comments
Open

A functional that takes a custom function and applies it to Data #69

vultor33 opened this issue May 24, 2019 · 3 comments
Labels
documentation Missing documentation or improvements in the existing one enhancement New feature or request

Comments

@vultor33
Copy link
Contributor

vultor33 commented May 24, 2019

I am afraid I am being a little annoying, but I am just trying to apply fklearn to my problem. I guess with a couple more issues I will be done.

Instructions

I need a functional that takes a custom function and applies it to my data. For example, create a new column adding other two.

Describe the feature and the current state.

The functional "custom_transformer" receives a function and applies it to a column. I need a functional that can create and delete columns just like onehot_categorizer do.
Keeping track of feature names could be a nuissance #58. I had done something about that for myself too #68.

Will this change a current behavior? How?

No, just add a functionality.

Proposed solution

Here is what I had done. Maybe could be useful as an inspiration.

(You could just copy and paste this code on a jupyter notebook with fklearn installed that it should work)

Generating data

import random
import pandas as pd
column1 = [random.choice(['a','b','c']) for x in range(100)]
column2 = [random.random() for x in range(100)]
column3 = [random.random() for x in range(100)]
training_data = pd.DataFrame({'cat_feature' : column1, 
                              'num_feature_1' : column2,
                              'num_feature_2' : column3})
FEATURES = training_data.columns.tolist()

DEFINING THE CUSTOM FUNCTION THAT ADDS TWO SPECIFIC COLUMNS:

  • 'num_feature_1' + 'num_feature_2'
  • This function has two addicional atributes:
    • log : information that will be added to the pipeline log
    • update_features : function that keep track of feature names after the application of sum_two_features
import pandas as pd
from typing import Dict, List

def sum_two_features(df: pd.DataFrame) -> pd.DataFrame:
    assert ('num_feature_1' in df.columns) & ('num_feature_2' in df.columns)
    new_df = df.copy()
    new_df['num_sum'] = new_df.loc[:,'num_feature_1'] + new_df.loc[:,'num_feature_2']
    new_df = new_df.drop(['num_feature_1','num_feature_2'], axis=1)
    return new_df

def sum_two_features_log() -> Dict[str,Dict]:
    return {'sum_two_features': {
        'removed_columns': ['num_feature_1','num_feature_2'],
        'added_columns': ['num_sum'],
        'more_info': 'num_sum column is num_feature_1 plus num_feature_1'}}

def sum_two_features_update_features(features: List[str]) -> List[str]:
    new_features = list(features)
    if 'num_feature_1' in new_features:
        new_features.remove('num_feature_1')
    if 'num_feature_2' in new_features:
        new_features.remove('num_feature_2')
    new_features += ['sum_two_features']
    return new_features

setattr(sum_two_features,'log', sum_two_features_log)
setattr(sum_two_features,'update_features', sum_two_features_update_features)

DEFNING THE FUNCTIONAL TO BE ADDED AT "transformation.py"

from toolz import curry
import pandas as pd
from typing import Callable
from fklearn.training.utils import log_learner_time
from fklearn.types import LearnerReturnType
from fklearn.common_docstrings import learner_return_docstring, learner_pred_fn_docstring

@curry
@log_learner_time(learner_name='custom_data_transformer')
def custom_data_transformer(df: pd.DataFrame,
                       transformation_function: Callable[[pd.DataFrame],pd.DataFrame]) -> LearnerReturnType:
    """
    Applies a custom function to the dataset.

    Parameters
    ----------
    df : pandas.DataFrame
        A Pandas' DataFrame

    transformation_function : function(pandas.DataFrame) -> pandas.DataFrame
        A function that receives a DataFrame as input, performs a transformation
        and returns another DataFrame.
        
    Additional info
    ----------
    transformation_function must have log attribute generated by:
        
        setattr(transformation_function ,'log', transformation_function_log_function)

    This log will be saved in the pipeline logs
    
    Also, transformation_function should have its own way to update the feature names.
    """

    def p(new_data_set: pd.DataFrame) -> pd.DataFrame:
        new_data_set = transformation_function(new_data_set)

        return new_data_set

    p.__doc__ = learner_pred_fn_docstring("custom_data_transformer")

    log = transformation_function.log()

    return p, p(df), log

custom_data_transformer.__doc__ += learner_return_docstring("Custom Data Transformer")

TESTING THE FUNCTIONAL

my_feature_adder = custom_data_transformer(transformation_function = sum_two_features)
(function_out, data_applied, log) = my_feature_adder(training_data)
print(data_applied.head(3),'\n')
print(log)

@vultor33 vultor33 added the enhancement New feature or request label May 24, 2019
@caique-lima
Copy link
Contributor

@vultor33 first, thanks for all the issues! We're always trying to improve the package usability!

Maybe I'm missing something, but to me this seems to be really similar to just implement a new learner from scratch, the only advantage is that with the proposed custom_data_transformer we guarantee the returns(p, p(df) and log), don't need to add the @curry decorator.

Plus in this example I don't see why we need sum_two_features_update_features, because we already know which column will appear in the output so we just need to add it to the features list. I think this case is a little different from the one hot encoding, because we already know which columns will be outputted.

@vultor33
Copy link
Contributor Author

vultor33 commented Jun 5, 2019

You are right.
So, maybe we could change the flag to "Documentation" and create a tutorial on how to implement a learner from scratch. I believe crazy people (like me) will want functions that are too specific and don't justify an official implementation.

@caique-lima
Copy link
Contributor

You are right.
So, maybe we could change the flag to "Documentation" and create a tutorial on how to implement a learner from scratch. I believe crazy people (like me) will want functions that are too specific and don't justify an official implementation.

Sounds good to me

@caique-lima caique-lima added the documentation Missing documentation or improvements in the existing one label Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Missing documentation or improvements in the existing one enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants