Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update model_training process (train and export) #169

Merged
merged 2 commits into from
Sep 29, 2023

Conversation

sunya-ch
Copy link
Contributor

@sunya-ch sunya-ch commented Sep 20, 2023

** Rebase from the PR #172 **

Also, refer to archived pipeline that should be merged on kepler-model-db first by sustainable-computing-io/kepler-model-db#16

Need

to be merged first.


This PR includes

  • document (README) auto-generation on export script (auto-generate document mentioned in enrich PR template for adding trained pipeline kepler-model-db#14)
  • sort metadata with feature group first
  • handle missing trainer class in the list
  • update get_url to also handle weight url
  • add assure_path disable option for getting remote machine path
  • add trainers_with_weight list to list model with weight
  • update train script to use only stressng benchmark
  • update stressng benchmark according to the result in community votes (19/09/2023)

Signed-off-by: Sunyanan Choochotkaew [email protected]

@sunya-ch sunya-ch marked this pull request as draft September 20, 2023 11:03
@sunya-ch sunya-ch added this to the kepler-release-0.6 milestone Sep 20, 2023
@sunya-ch sunya-ch marked this pull request as ready for review September 20, 2023 11:19
@@ -180,7 +181,7 @@ function quick_collect() {
}

function train() {
train_model stressng_kepler_query,coremark_kepler_query,parsec_kepler_query ${VERSION}
train_model stressng_kepler_query ${VERSION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this selecting the data only for training?

If it is also for the validation: should we also use the coremark results for testing to verify the accuracy of the model with different workload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be shuffled and use 10% of it for validation.

def normalize_and_split(X_values, y_values, scaler, test_size=0.1):

We need to refactor the code to have a fixed validation dataset. Should be created for a separate issue.

@sunya-ch sunya-ch marked this pull request as draft September 27, 2023 06:15
Signed-off-by: Sunyanan Choochotkaew <[email protected]>
@rootfs
Copy link
Contributor

rootfs commented Sep 28, 2023

@sunya-ch the CI passed now.

@sunya-ch sunya-ch marked this pull request as ready for review September 28, 2023 13:22
@rootfs rootfs merged commit 7e4e716 into sustainable-computing-io:main Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants