-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#129 data recipe to augment new features #131
base: rel-1.9.1
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
data/augment.py
Outdated
@@ -0,0 +1,758 @@ | |||
""" | |||
|
|||
This data recipe lets the user to augment new features to the dataset using the Augment Cloud Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add more description about this recipe from the perspective of DAI. Example starting with requirements.
SnowFlake,
DataSet
DAI
description about augmentation, where the output of augment will be persisted, and how it is consumed by DAI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do it
data/augment.py
Outdated
6. The recipe polls the API for the completion of the table creation | ||
6. The recipe exports the dataset back to user's snowflake account | ||
7. The recipe downloads, saves the dataset from snowflake into driverlessai instance and returns the file path | ||
8. A new dataset is created in DAI with the augmented columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
customer facing recipe i would use full product name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@surenH2oai I have it updated now :)
return "", str(e) | ||
|
||
|
||
class AugmentDataset(CustomData): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the CustomData used? I guess this is needed since data recipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@surenH2oai CustomData
is the base class and we are overriding the method create_data
in the subclass 'AugmentDataset. DAI will find out the subclass that derives from
CustomDatawhich in this case is
AugmentDatasetand it will invoke the
create_data` method to get the updated dataframe with original columns + augmented columns
This PR adds the data recipe that lets user augment new features to the dataset by using the augment service
https://github.com/h2oai/h2oai/issues/20586