We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src/fklearn/preprocessing/splitting.py
Function: space_time_split_dataset
space_time_split_dataset
space_time_split_dataset splits the input DataFrame into 4 parts:
The outime_inspace_hdout split is defined wrongly on the function. It is only filtered by time, but not in space
outime_inspace_hdout
outime_inspace_hdout = dataset[ (dataset[time_column] >= train_end_date) & (dataset[time_column] < holdout_end_date)]
The outime_inspace_hdout should split the DataFrame out of space and in time, not only out of space
We should rename this variable to outime_hdout, and define the right split with the other 3.
outime_hdout
E.g.
train_set = train_period[~train_period[space_column].isin(holdout_space)] intime_outspace_hdout = train_period[train_period[space_column].isin(holdout_space)] outime_outspace_hdout = outime_hdout[outime_hdout[space_column].isin(holdout_space)] outime_inspace_hdout = outime_hdout[~outime_hdout[space_column].isin(holdout_space)]
The text was updated successfully, but these errors were encountered:
Is a bit hard to understand this issue because I think we have some messed up stuff in the documentation.
First of all seems that we have an issue with the docstring:
intime_outspace_hdout : pandas.DataFrame The out of ID sample and in time hold out set.
Is equal to
outime_inspace_hdout : pandas.DataFrame The out of ID sample and in time hold out set.
This last one should be The in ID sample and out time hold out set
The in ID sample and out time hold out set
And just to clarify, the expected behaviour is that in this test:
https://github.com/nubank/fklearn/blob/master/tests/preprocessing/test_splitting.py#L62
This value should be:
'space': ['space2'],
Just like the training dataset
@marcelogdeandrade can you solve this bug? Your solution seems fine to me, but don't forget to update the docstring
Sorry, something went wrong.
@caique-lima Sure, I'll solve it
Related to #62
Successfully merging a pull request may close this issue.
Issue location
src/fklearn/preprocessing/splitting.py
Function:
space_time_split_dataset
Problem description
space_time_split_dataset
splits the input DataFrame into 4 parts:The
outime_inspace_hdout
split is defined wrongly on the function. It is only filtered by time, but not in spaceExpected behavior
The
outime_inspace_hdout
should split the DataFrame out of space and in time, not only out of spacePossible solutions
We should rename this variable to
outime_hdout
, and define the right split with the other 3.E.g.
The text was updated successfully, but these errors were encountered: