-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCI Hydraulic Preprocess Data Recipe #99
base: master
Are you sure you want to change the base?
Conversation
* Added data recipe for importing hydraulic condition monitoring dataset * Renamed the file to be shorter
* Added data recipe for importing hydraulic condition monitoring dataset * Renamed the file to be shorter * Updated Data Recipe based on Edgar's feedback Changed all places where I was using module level objects to use only import modules for code readability. Added type hints for each class method, which improves readability by specifying the expected argument and return types. Updated comments with keyword arguments. Replaced series of if/elif statements with a dictionary. Updated code that specifies the destination filepath for output file by using os join instead of file replace. Changed all my custom variable names to lowercase underscores instead of mixed case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Apologies for the delay. My GitHub pull request bot didn't tell me about this outstanding PR (but did for others). I only came across this when clearing out my github web-UI notification dashboard.
import os | ||
import zipfile | ||
# call on members Union, List | ||
import typing | ||
# call on members data.CustomData, systemutils_more.download, systemutils.config | ||
import h2oaicore | ||
import datatable as dt | ||
import numpy as np | ||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style comment, very much only a nitpick:
Imports should be grouped in the following order:
Standard library imports.
Related third party imports.
Local application/library specific imports.
You should put a blank line between each group of imports.
Reference: https://www.python.org/dev/peps/pep-0008/#imports
In other words:
import os
import typing
import zipfile
import datatable as dt
import numpy as np
import pandas as pd
import h2oaicore
Also in alphabetical order and removed comments unless necessary.
# url contains hydraulic systems condition monitoring data set | ||
link = "https://archive.ics.uci.edu/ml/machine-learning-databases/00447/data.zip" | ||
# returns temp_path to filename with filename extracted from URL | ||
file = h2oaicore.systemutils_more.download(link, dest_path=temp_path) | ||
# Choose a target_label: cool_cond_y, valve_cond_y, pump_leak_y, acc_gas_leak_y, stable_y | ||
# The target_label you choose will get added on as the last column in the training dataset | ||
target_label = "cool_cond_y" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would consider moving the link and target_label strings to constants at the top of the file, along with their comments, so it would be easy for someone to go in and quickly change these (where people typically look for these types of configurable fields).
Then having link
and target_label
referencing those constants. It may also make this block short and easier to quickly read.
# The target_label you choose will get added on as the last column in the training dataset | ||
target_label = "cool_cond_y" | ||
# creates csv destination filepath to store processed hydraulic condition monitoring data | ||
output_file = os.path.join(temp_path, "hydCondMonitorData.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq/nit: Is there a significance for the filename being camelCase
instead of under_scored
? It might be beneficial to use underscores as it's more typical in filesystem naming and also matches python variable naming.
Just curious, and sharing thoughts, I don't mean to push.
# import one of the hydraulic condition monitoring labels from profile based on target label | ||
df_hyd_cond_monitor_label_y = import_target_label(zip, "profile.txt", target_label) | ||
|
||
# Combine all sensor Dataframes and then add hydraulic condition monitoring label as last column | ||
df_hyd_cond_monitor_data_x = pd.concat([df_ps1, df_ps2, df_ps3, df_ps4, df_ps5, df_ps6, df_vf1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are great 👍 - thanks! Made the code easy to glance through for me while not having to be familiar with the name of the files in the zip. 🤩
"pump_leak_y": pd.DataFrame(pd_frame.iloc[:,2]), | ||
# store hydraulic accumulator gas leakage label data from column 3 into pd frame | ||
"acc_gas_leak_y": pd.DataFrame(pd_frame.iloc[:,3]), | ||
# store hydraulic stable flag label data from column 4 into pd frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very much just a nit/observation: Comments make up 50% of this block. I wonder if the could be combined into a single multi-line comment at the top, perhaps making the block flow more easily w/o being separated by comments each line.
The Hydraulic System Sensor Data Recipe does the following: