-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2855] Rethink the call for update metrics with nil resource #960
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #960 +/- ##
==========================================
+ Coverage 80.95% 80.96% +0.01%
==========================================
Files 97 97
Lines 12514 12527 +13
==========================================
+ Hits 10131 10143 +12
- Misses 2113 2115 +2
+ Partials 270 269 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on this approach. This adds significant complexity to the code and requires all callers to do the calculations... make the metrics functions handle the nil case properly themselves.
Thanks @craigcondit for review, i addressed the code to a more reasonable way and setting zero resource for nil update case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something entirely... Why are we setting arbitrary resources to zero here? What is it we're trying to accomplish?
@craigcondit , we support setting maxResource/ guaranteed resource to nil, but the metrics update is wrong because we don't update it from previous value update to zero/nil, for example: yunikorn-core/pkg/scheduler/objects/queue.go Line 415 in 7c51e82
yunikorn-core/pkg/scheduler/objects/queue.go Line 441 in 7c51e82
|
Then it seems the proper solution is to retrieve the previous value of the metric and update the values there (an "unprune" so to speak). Missing values in the new map should be set to zero. Setting arbitrary resources is just wrong. Doing it this way also has the advantage of ensuring that any resource types that were emitted by the metric previously are still emitted, but with zero instead of missing. |
Thanks, @craigcondit i like the idea unprune, i will address this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just... no. You already have the metric in the collector itself. Read it out to get the list of values that are previously set and build a new metric using a combination of the old and new values. There should be no code changes outside the collector at all.
I see... @craigcondit let me change my code. |
Addressed in latest PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is still too complex. See review comments.
pkg/metrics/queue.go
Outdated
} | ||
|
||
func (m *QueueMetrics) SetQueueMaxResourceMetrics(resourceName string, value float64) { | ||
m.setQueueResource(QueueMax, resourceName, value) | ||
func (m *QueueMetrics) SetQueueNilResourceMetrics(state string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be made an internal function once logic is moved here from scheduler/objects/queue.go.
672087d
to
dc6c29a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add locking per comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change lock type.
pkg/metrics/queue.go
Outdated
knownResourceTypes map[string]struct{} | ||
lock locking.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be an RWMutex as we never lock for reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, addressed in latest PR, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM.
What is this PR for?
We will not update the nil resource to resource metrics, we need to fix it.
What type of PR is it?
Todos
What is the Jira issue?
[YUNIKORN-2] Gang scheduling interface parameters
How should this be tested?
Screenshots (if appropriate)
Questions: