Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/adapt dev to 3w dataset 2.0 #126

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

castrokelly
Copy link

This pull request adapts the dev.py sub-module to ensure full compatibility with the 3W Dataset 2.0. The main changes include updating the EventFolds class to correctly handle the new data loading process and removing the redundant extrai_arrays() function.

Changes made:

  • Removed extrai_arrays() function: This function was previously used to extract data from individual CSV files. With the new load_3w_dataset() function in base.py, which loads the entire dataset into a Pandas DataFrame, the extrai_arrays() function became redundant and was removed.
  • Updated EventFolds class:
    • The __init__() method was modified to receive the complete DataFrame as a parameter instead of individual instance names. This change streamlines the data loading process and improves efficiency.
    • The carregue_instancia() method was updated to use the load_3w_dataset() function for loading data, ensuring consistency and compatibility with the new data structure.
    • The logic for extracting training and test samples was adjusted to work with the DataFrame structure.
  • Updated Experiment class: The folds() method was adjusted to pass the DataFrame to the EventFolds class, ensuring the correct data flow.

Example usage:

The following code snippet demonstrates how to use the updated Experiment class with the 3W Dataset 2.0:

import toolkit as tk

# Create an experiment for the "SPURIOUS_CLOSURE_OF_DHSV" event
experiment = tk.Experiment(event_name="SPURIOUS_CLOSURE_OF_DHSV")

# Generate the folds for the experiment
folds = experiment.folds()

# Access the training and test samples for each fold
for fold in folds:
    X_train, y_train = fold.extract_training_samples()
    X_test = fold.extract_test_samples()

    # ... your machine learning model training and evaluation code here ...

Benefits:

  • Compatibility with 3W Dataset 2.0: Ensures seamless integration with the latest version of the dataset.
  • Improved efficiency: Removes redundant code and optimizes data loading.
  • Simplified workflow: Streamlines the process of accessing and preparing data for machine learning experiments.
  • Enhanced maintainability: Improves code readability and maintainability by removing unnecessary complexity.

This contribution significantly improves the usability and efficiency of the 3W Toolkit when working with the 3W Dataset 2.0, facilitating research and development of machine learning models for anomaly detection in oil wells.


By creating this pull request, I confirm that I have read and fully accept and agree with one of the Petrobras' Contributor License Agreements (CLAs):

Our CLAs are based on the Apache Software Foundation's CLAs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant