Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support new topologySpread scheduling constraints #852

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jmdeal
Copy link
Member

@jmdeal jmdeal commented Dec 8, 2023

Fixes #430

Description
This PR adds support for the following topology spread constraint fields:

  • matchLabelKeys
  • nodeAffinityPolicy
  • nodeTaintsPolicy

How was this change tested?
make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 8, 2023
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 8, 2023
@jmdeal jmdeal changed the title feat: support new topologySpread scheduling constraints [WIP] feat: support new topologySpread scheduling constraints Dec 8, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 8, 2023
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from aed9f49 to 54af319 Compare December 8, 2023 19:34
@coveralls
Copy link

coveralls commented Dec 8, 2023

Pull Request Test Coverage Report for Build 10602954452

Details

  • 125 of 132 (94.7%) changed or added relevant lines in 10 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.1%) to 80.89%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/provisioning/provisioner.go 12 14 85.71%
pkg/controllers/provisioning/scheduling/topology.go 28 30 93.33%
pkg/utils/pod/scheduling.go 1 4 25.0%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/node/termination/controller.go 2 66.47%
Totals Coverage Status
Change from base Build 10590933041: 0.1%
Covered Lines: 8432
Relevant Lines: 10424

💛 - Coveralls

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch 7 times, most recently from 4329131 to f0127f2 Compare December 9, 2023 01:46
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch 3 times, most recently from b23a605 to 0408162 Compare December 11, 2023 21:59
@jmdeal jmdeal changed the title [WIP] feat: support new topologySpread scheduling constraints feat: support new topologySpread scheduling constraints Dec 11, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2023
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 0408162 to 6673b5b Compare December 11, 2023 23:14
@jmdeal jmdeal changed the title feat: support new topologySpread scheduling constraints [WIP] feat: support new topologySpread scheduling constraints Dec 13, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 13, 2023
@jmdeal
Copy link
Member Author

jmdeal commented Dec 13, 2023

Holding to include matchLabelKeys for pod affinity with k8s v1.29.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2023
@k8s-ci-robot
Copy link
Contributor

@Bryce-Soghigian: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bryce-Soghigian
Copy link
Member

Is there a way to keep stalebot away for PRs?

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 1961fac to b10543c Compare June 13, 2024 20:43
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmdeal
Once this PR has been reviewed and has the lgtm label, please assign njtran for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from b10543c to 28c14ac Compare June 13, 2024 20:50
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 13, 2024
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 28c14ac to dae9cfd Compare June 13, 2024 21:05
@jmdeal
Copy link
Member Author

jmdeal commented Jun 13, 2024

Benchmarking results (against 37db06f4742eada19934f76e616f2b1860aca57b):

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/karpenter/pkg/controllers/provisioning/scheduling
                  │ old_results.txt │           new_results.txt            │
                  │     sec/op      │    sec/op      vs base               │
Scheduling1-10         99.67µ ± 42%   103.39µ ± 30%       ~ (p=0.989 n=20)
Scheduling50-10        305.6µ ±  6%    305.5µ ±  7%       ~ (p=0.265 n=20)
Scheduling100-10       488.9µ ±  6%    451.0µ ±  5%  -7.76% (p=0.001 n=20)
Scheduling500-10       1.929m ±  1%    1.859m ±  2%  -3.66% (p=0.002 n=20)
Scheduling1000-10      3.955m ±  3%    3.999m ±  7%       ~ (p=0.478 n=20)
Scheduling2000-10      7.518m ±  7%    8.072m ±  4%  +7.38% (p=0.013 n=20)
Scheduling5000-10      19.45m ±  9%    17.96m ±  4%  -7.69% (p=0.038 n=20)
geomean                1.494m          1.477m        -1.13%

                  │ old_results.txt │           new_results.txt           │
                  │      nodes      │   nodes     vs base                 │
Scheduling1-10           1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling50-10          5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling100-10         5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling500-10         5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling1000-10        5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling2000-10        5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
Scheduling5000-10        5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=20) ¹
geomean                  3.973        3.973       +0.00%
¹ all samples are equal

                  │ old_results.txt │           new_results.txt            │
                  │      pods       │    pods      vs base                 │
Scheduling1-10          1.000 ±  0%   1.000 ±  0%       ~ (p=1.000 n=20) ¹
Scheduling50-10         29.00 ± 17%   29.00 ± 24%       ~ (p=0.814 n=20)
Scheduling100-10        25.50 ± 25%   26.00 ± 23%       ~ (p=0.732 n=20)
Scheduling500-10        28.00 ± 50%   25.00 ± 32%       ~ (p=0.587 n=20)
Scheduling1000-10       35.50 ± 21%   29.00 ± 21%       ~ (p=0.279 n=20)
Scheduling2000-10       29.50 ± 39%   28.00 ± 29%       ~ (p=0.815 n=20)
Scheduling5000-10       34.00 ± 41%   31.00 ± 35%       ~ (p=0.344 n=20)
geomean                 18.48         17.36        -6.10%
¹ all samples are equal

                  │ old_results.txt │           new_results.txt           │
                  │    pods/sec     │   pods/sec    vs base               │
Scheduling1-10        10.044k ± 30%   9.672k ± 24%       ~ (p=0.984 n=20)
Scheduling50-10        163.6k ±  6%   163.7k ±  8%       ~ (p=0.265 n=20)
Scheduling100-10       204.5k ±  6%   221.8k ±  5%  +8.42% (p=0.001 n=20)
Scheduling500-10       259.2k ±  1%   269.0k ±  2%  +3.80% (p=0.002 n=20)
Scheduling1000-10      252.9k ±  3%   250.0k ±  8%       ~ (p=0.478 n=20)
Scheduling2000-10      266.1k ±  7%   247.8k ±  4%  -6.91% (p=0.013 n=20)
Scheduling5000-10      257.0k ± 10%   278.4k ±  3%  +8.32% (p=0.038 n=20)
geomean                147.3k         149.0k        +1.12%

Ran this comparison a few times, this was the set of results with the largest difference. Marginally faster overall surprisingly, with a slight decrease in first round scheduled pods (though I noticed this stat varied wildly between runs).

@jmdeal
Copy link
Member Author

jmdeal commented Jun 13, 2024

/remove blocked

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 29, 2024
@abrazouski
Copy link

Sorry @jmdeal for bothering, but any plans for moving forward with this?
We're really rely on this feature

@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from dae9cfd to 1d46f34 Compare August 15, 2024 23:17
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2024
@jmdeal jmdeal force-pushed the support-affinity-and-taint-policy branch from 1d46f34 to 7b3827f Compare August 28, 2024 17:01
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 15, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hamishforbes
Copy link

Hi @jmdeal is there anything blocking this or any way I can help to move this on (testing a custom build or something)?

@engedaam
Copy link
Contributor

engedaam commented Dec 12, 2024

/assign @njtran

@k8s-ci-robot
Copy link
Contributor

@engedaam: GitHub didn't allow me to assign the following users: njtarn.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @njtarn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@engedaam
Copy link
Contributor

/assign @njtran

@jonathan-innis
Copy link
Member

/unassign njtran

@jonathan-innis
Copy link
Member

/assign jonathan-innis

@jonathan-innis
Copy link
Member

Ran this comparison a few times, this was the set of results with the largest difference. Marginally faster overall surprisingly, with a slight decrease in first round scheduled pods (though I noticed this stat varied wildly between runs).

And I think I figured out why! Turns out that we really haven't been doing our scheduling benchmarking correctly in a while! See #1930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Unable to make progress due to some dependency cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support new topologySpread scheduling constraints