Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "blocked/not blocked" in job count metrics #2945

Open
severo opened this issue Jun 24, 2024 · 5 comments
Open

Add "blocked/not blocked" in job count metrics #2945

severo opened this issue Jun 24, 2024 · 5 comments
Assignees
Labels
metrics P2 Nice to have

Comments

@severo
Copy link
Collaborator

severo commented Jun 24, 2024

Now that we block datasets, the job count metrics are a bit misleading, because they still include the jobs of blocked datasets. We need to be able to filter them out, because they are outside of the queue during the blockage.

@severo severo added P1 Not as needed as P0, but still important/wanted metrics labels Jun 24, 2024
@severo
Copy link
Collaborator Author

severo commented Jun 24, 2024

It's very important because it commands the auto-scaler. We don't want the number of pods to remain high while all the jobs are blocked.

@severo severo added P0 Highest priority and removed P1 Not as needed as P0, but still important/wanted labels Jun 24, 2024
@severo
Copy link
Collaborator Author

severo commented Jun 24, 2024

reference: #2279 (comment)

@AndreaFrancis
Copy link
Contributor

Pending (#2949 (comment)):

  • Decrease jobs when a dataset is blocked
  • Find a way to include the jobs in the counter once the dataset has been released from blocked list

@severo
Copy link
Collaborator Author

severo commented Jun 26, 2024

Find a way to include the jobs in the counter once the dataset has been released from blocked list

I think it's OK to wait for the next cron job (every 10 minutes in prod, compared to the duration of the dataset blockage: 6 hours)

@AndreaFrancis AndreaFrancis self-assigned this Jul 1, 2024
@severo severo added P2 Nice to have and removed P0 Highest priority labels Jul 30, 2024
@severo
Copy link
Collaborator Author

severo commented Jul 30, 2024

We still need to be able to filter out the blocked jobs from the charts in https://grafana.huggingface.tech/d/i7gwsO5Vz/global-view?orgId=1 (or show two curves: blocked / not blocked) because it's misleading otherwise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics P2 Nice to have
Projects
None yet
Development

No branches or pull requests

2 participants