-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changes to collect metrics from Prometheus with benchmark run outside of kepler-model-server #191
Conversation
…mark run outside kepler-model-server Signed-off-by: Krishnasuri Narayanam <[email protected]>
Thank you for the PR. It is promising to extend the custom metrics for validation and train. I think we don't need to create a new function for them. We can add an option for specifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the PR. I think this PR is very helpful on introducing a new custom benchmark which is not based on CPE operator. Without CPE-defined benchmark, trainer can directly set start time and end time (or last interval) of the benchmark.
However, since it has some conflict on the current design and recently-merged PR (#185), I would like to have some more discussion on changes.
After we got the conclusion, it would be great if contributor would also update the instruction in https://github.com/sustainable-computing-io/kepler-model-server/blob/main/contributing.md#introduce-new-benchmarks.
Signed-off-by: Krishnasuri Narayanam <[email protected]>
Signed-off-by: Krishnasuri Narayanam <[email protected]>
…he benchmark suite Signed-off-by: Krishnasuri Narayanam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knarayan Thank you so much for your contribution.
As there are multiple significant changes here, I would like to summarize the contribution of this PR again as below.
This PR includes
- introducing customBenchmark with variable-specified startTime, endTime for non-CPE benchmark (separate sample, stressng, and customBenchmark)
- supporting validation for all available queries
- removing required benchmark constraint on export function to allow reuse on customBenchmark
Please feel free to add or correct my summary.
Users might want to run a custom benchmark outside of
kepler-model-server
and collect metrics usingkepler-model-server
to train model subsequently.Collect metrics (only query Prometheus to fetch metrics in the specified time window)
Validate the collected metrics
Train custom benchmark metrics
Note that Prometheus is queried to collect both
kepler
provided power consumption andnode-exporter
provided CPU utilization metrics (likenode_cpu_seconds_total
) for each cluster node.