Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing information and duplicated lines on splitting.py #62

Open
vultor33 opened this issue May 13, 2019 · 0 comments
Open

Missing information and duplicated lines on splitting.py #62

vultor33 opened this issue May 13, 2019 · 0 comments
Labels
documentation Missing documentation or improvements in the existing one
Milestone

Comments

@vultor33
Copy link
Contributor

Instructions

There is some missing information and duplicated lines on splitting.py documentation.
Path: .\fklearn\src\fklearn\preprocessing\splitting.py

Describe the documentation issue

It is in space_time_split_dataset function:

    Returns
    ----------
    train_set : pandas.DataFrame
        The in ID sample and in time training set.

    intime_outspace_hdout : pandas.DataFrame
        The out of ID sample and in time hold out set.  #duplicated line

    outime_inspace_hdout : pandas.DataFrame
         The out of ID sample and in time hold out set. #duplicated line

    holdout_space : pandas.DataFrame    
         The out of ID sample and in time hold out set. #duplicated line



#Should it return holdout_space?

Possible solutions

The following text is my guess of what this function should return:

   Returns
    ----------
    train_set : pandas.DataFrame
        Samples with timestamp >= train_start_date and timestamp < train_end_date
        All IDs are included except from those selected for validation (holdout_space)

    intime_outspace_hdout : pandas.DataFrame
        Samples with same timestamps of train_set
        IDs are selected in holdout_space array
        All rows with selected ID and in specified timestamps are included

    outime_inspace_hdout : pandas.DataFrame
        Samples with timestamp >= train_end_date and timestamp < holdout_end_date
        All IDs are included

    outime_outspace_hdout : pandas.DataFrame
        Samples with same timestamps of outime_inspace_hdout.
        IDs are selected in holdout_space array 
        All rows with selected ID and in specified timestamps are included

@vultor33 vultor33 added the documentation Missing documentation or improvements in the existing one label May 13, 2019
@caique-lima caique-lima added this to the 1.16.x milestone Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Missing documentation or improvements in the existing one
Projects
None yet
Development

No branches or pull requests

2 participants