Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support third party metrics in model server #185

Merged

Conversation

LeiZhou-97
Copy link
Contributor

End user may have some specific metrics that are not included in kepler, so this patch enhance the model server to collect third party metrics

Limitation:
the pattern of thirdparty_metrics must align with kepler container level metrics, which contains containerid, containername and etc. Because it can reuse kepler container power consumption as output for training.

thridparty_metrics + other metrics(like cgroup....) -> input
container power consumption from kepler -> output

@sunya-ch
Copy link
Contributor

sunya-ch commented Nov 6, 2023

Thank you for your contribution.
I believe it is related to the issue #181.
This is very promising to dynamically add third-party feature group.
Please update the code to handle the None value of feature group in the case that it is not set to pass the CI test.
If all test is pass, this PR looks good to me.

Additional comment:
In my understanding, it will be set only when we set it in cmd argument.
I would be nice if we can also call update_thirdparty_metrics when the third party metrics are set via environment or configmap for the model server, online trainer, and estimator sidecar. However, this could be done on the another PR if you don't want to implement it here.

@LeiZhou-97
Copy link
Contributor Author

@sunya-ch
Thanks for your quick reply. I will update my code to fix the CI issue.

In my original design, I want to add the metrics into configmap and watch it in realtime. If it changes, the model server will update accordingly. This approach has some more effort and I have no much time. I can move it to the environment to align with existing configs as an work around.

@sunya-ch
Copy link
Contributor

sunya-ch commented Nov 6, 2023

@sunya-ch Thanks for your quick reply. I will update my code to fix the CI issue.

In my original design, I want to add the metrics into configmap and watch it in realtime. If it changes, the model server will update accordingly. This approach has some more effort and I have no much time. I can move it to the environment to align with existing configs as an work around.

Thanks a lot. Nice work.

Actually, if you call the function getConfig in util module , it will try reading environment value or config data if set.

You may define a key for the third-party metric such as THIRD_PARTY_METRICS then call getConfig('THIRD_PARTY_METRICS', "") to get the value. This key will be applied for both direct environment variable and config map data without any modification.

For handling None, I think you can define some never-valid word such as NON_EXISTENT_METRIC="NonExistentMetric" and set FeatureGroup.ThirdParty: [NON_EXISTENT_METRIC] instead of None.

End user may have some specific metrics that are not included
in kepler, so this patch enhance the model server to collect
third party metrics

Signed-off-by: LeiZhou-97 <[email protected]>
@LeiZhou-97
Copy link
Contributor Author

I've updated my code. Now it can get thirdparty_metrics from cmd and configmap/environment.

I also run the unittest on my server and it passed. It my need your approve to re-trigger the GHA.

@sunya-ch sunya-ch merged commit ea6bd96 into sustainable-computing-io:main Nov 8, 2023
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants