Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add managed model registry prometheus job, metrics, and alering rules, fixes RHOAIENG-4273 #318

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dhirajsb
Copy link

@dhirajsb dhirajsb commented Aug 1, 2024

Description

Adds managed model registry prometheus config for the following:

  • job
  • metrics
  • alering rules

Fixes RHOAIENG-4273

https://issues.redhat.com/browse/RHOAIENG-4273

How Has This Been Tested?

WIP

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link

openshift-ci bot commented Aug 1, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhirajsb
Once this PR has been reviewed and has the lgtm label, please assign lburgazzoli for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dhirajsb dhirajsb marked this pull request as draft August 1, 2024 03:49
@dhirajsb
Copy link
Author

dhirajsb commented Aug 1, 2024

This is a draft PR with prometheus config changes to the yaml file. I'm creating another PR to enable loading this config in odh incubation. This PR should be merged after the model registry component itself is merged into rhds.

Please review and let me know if there are any changes or anything else is needed.

- alert: Model Registry Operator Probe Success Burn Rate
annotations:
message: 'High error budget burn for {{ $labels.instance }} (current value: {{ $value }}).'
triage: "https://gitlab.cee.redhat.com/service/managed-tenants-sops/-/blob/main/RHODS/Model-Serving/rhods-model-registry-operator-probe-success-burn-rate.md"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing before merge this PR is to get https://gitlab.cee.redhat.com/service/managed-tenants-sops/-/blob/main/RHODS/Model-Serving/rhods-model-registry-operator-probe-success-burn-rate.md created.
preferably to name the file as rhoai-model-registry-operator-probe-success-burn-rate.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I noticed that sop link. Is there a process to contact someone for working on it, or can I just open a PR and go from there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to that, I used the simple name Model Registry Operator in the config names and descriptions. I noticed that some components used ODH prefix. Does model registry need to use RHOAI Model Registry Operator instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you can make a PR and ask (someone) in the project to review/approve it.
should be a owner file somewhere with name list there

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, i do not know the answer. I would say if ODH already exists by other component, then you can use it, or just skip ODH or any downstream name. but better check with project, if any hard/soft rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants