Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: NodeNUMAResource supports filtering/scoring with node-level amplification ratios #1677

Conversation

eahydra
Copy link
Member

@eahydra eahydra commented Sep 22, 2023

Ⅰ. Describe what this PR does

Based on #1602, implement amplified CPU filtering/scoring in plugin NodeNUMAResource.

After a detailed discussion with @zqzten, the author of #1602, the core goal of #1602 is to consider that if there is a LSE/LSR Pod on a node with a CPU amplification ratio, the corresponding CPU requests must be amplified to ensure the runtime QoS of the LSE/LSR Pod. Of course, it can also work to implement a new plugin separately, but it is more reasonable to implement it in the NodeNUMAResource plugin, because the NodeNUMAResource plugin supports allocating CPU cores to LSE/LSR Pods. Therefore, in the Filter stage, the requested amount and the allocated amount are amplified (prerequisite is LSE/LSR Pod) to ensure that the capacity is legal; the situation is similar in the scoring stage.

Ⅱ. Does this pull request fix one issue?

part of #1539

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Sep 22, 2023

Codecov Report

Patch coverage: 77.46% and project coverage change: +0.05% 🎉

Comparison is base (f3b3eb4) 66.05% compared to head (efc7a9d) 66.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1677      +/-   ##
==========================================
+ Coverage   66.05%   66.11%   +0.05%     
==========================================
  Files         384      384              
  Lines       41454    41519      +65     
==========================================
+ Hits        27384    27451      +67     
+ Misses      12059    12046      -13     
- Partials     2011     2022      +11     
Flag Coverage Δ
unittests 66.11% <77.46%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...duler/plugins/nodenumaresource/resource_manager.go 82.35% <0.00%> (+0.98%) ⬆️
pkg/scheduler/plugins/nodenumaresource/service.go 70.90% <ø> (ø)
pkg/scheduler/plugins/nodenumaresource/scoring.go 75.60% <76.47%> (-0.69%) ⬇️
pkg/scheduler/plugins/nodenumaresource/plugin.go 68.51% <80.55%> (+3.02%) ⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hormes
Copy link
Member

hormes commented Sep 22, 2023

/lgtm

Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit fc26933 into koordinator-sh:main Sep 23, 2023
18 checks passed
}

node := nodeInfo.Node()
ratios, err := extension.GetNodeResourceAmplificationRatios(node.Annotations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can call the convenience method cpuAmplificationRatio, err := extension.GetNodeResourceAmplificationRatio(node.Annotations, corev1.ResourceCPU) here

func (p *Plugin) scoreWithAmplifiedCPUs(cycleState *framework.CycleState, state *preFilterState, pod *corev1.Pod, nodeInfo *framework.NodeInfo, topologyOptions TopologyOptions) (int64, *framework.Status) {
quantity := state.requests[corev1.ResourceCPU]
if quantity.IsZero() {
return 0, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call the default scorer here rather than return 0 to keep the same behavior as the NodeResourceFit plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants