Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with upstream v1.28.0 #260

Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
656 commits
Select commit Hold shift + click to select a range
fab8ec7
feat(*): add more metrics
qianlei90 Jan 12, 2023
d62b483
Added the RBAC Permission to Linode.
Shubham82 May 29, 2023
3b16d93
CA - Correct Cloudprovider PR labelling to area/provider/<provider name>
gjtempleton May 29, 2023
97c12df
Merge pull request #5743 from vishalanarase/civo-support-pool-cinfig
k8s-ci-robot May 29, 2023
e9aba09
fix(volcengine): don't build all provider when volcengine tag exist
qianlei90 May 30, 2023
a67ca23
Merge pull request #5818 from gjtempleton/CA-Correct-Prow-CloudProvid…
k8s-ci-robot May 30, 2023
90c7725
chore: replace `github.com/ghodss/yaml` with `sigs.k8s.io/yaml`
Juneezee May 30, 2023
19ffbc5
Merge pull request #5816 from Shubham82/add_RBAC_permissions_linode
k8s-ci-robot May 31, 2023
18867db
Merge pull request #5819 from qianlei90/fix-volcengine-build
k8s-ci-robot May 31, 2023
615937a
Fixed Typo and Trailing-whitespace
Shubham82 May 31, 2023
035a9b9
Skip healthiness check for non-existing similar node groups
BigDarkClown May 30, 2023
de034db
Merge pull request #5823 from Shubham82/typo_fix_VPA
k8s-ci-robot May 31, 2023
69294fe
Merge pull request #5547 from jbartosik/addon-resizer-kep-proposal
k8s-ci-robot May 31, 2023
b5c1d68
Merge pull request #5824 from BigDarkClown/fix-similar-exist
k8s-ci-robot May 31, 2023
ad5e3f6
Merge pull request #5822 from Juneezee/chore/yaml
k8s-ci-robot Jun 2, 2023
49cfd18
BinpackingLimiter interface
kushagra98 Jun 2, 2023
567b082
fix comment and list format
kei-gnu Jun 4, 2023
1d0a75c
Merge pull request #5830 from kei-gnu/fix-comment-and-list-formatting
k8s-ci-robot Jun 5, 2023
5b0ad27
add more logging for balancing similar node groups
elmiko Jun 5, 2023
8a98124
Update VPA scripts to use v1.
Shubham82 Jun 6, 2023
7217870
fix: don't clean `CriticalAddonsOnly` taint from template nodes
vadasambar Jun 6, 2023
4165ccf
Updated the owners of civo cloudprovider
vishalanarase Jun 7, 2023
36b80fe
Bump golang from 1.20.4 to 1.20.5 in /vertical-pod-autoscaler/builder
dependabot[bot] Jun 7, 2023
f336239
Merge pull request #5841 from vishalanarase/civo-update-owners
k8s-ci-robot Jun 8, 2023
0634543
cluster-autoscaler: support Brightbox image pattern
NeilW Jan 25, 2023
f761f83
brightbox: set default docker registry
NeilW Jun 8, 2023
4150733
Update oci-ip-cluster-autoscaler-w-config.yaml
sourabhgupta385 Jun 9, 2023
f9fe543
Update oci-ip-cluster-autoscaler-w-principals.yaml
sourabhgupta385 Jun 9, 2023
1249cb8
address comments
kushagra98 Jun 12, 2023
94313dd
golint fix
kushagra98 Jun 12, 2023
9c3d2e0
Merge pull request #5755 from jbartosik/vpa-in-place-aep
k8s-ci-robot Jun 12, 2023
dd4da3b
Merge pull request #5845 from sourabhgupta385/master
k8s-ci-robot Jun 12, 2023
316cd12
Remove print condition for vpa-beta2-crd.
Shubham82 Jun 13, 2023
45862b9
Improvement: Modified the VPA content for the helm chart.
Shubham82 May 18, 2023
d47a800
Bump the Chart version to 9.29.1 and CA image to 1.27.2
Shubham82 Jun 13, 2023
3c5cc58
Merge pull request #5763 from Shubham82/modify_content_of_vpa
k8s-ci-robot Jun 13, 2023
db0c783
make no-op binpacking limiter as default + move mark nodegroups to it…
kushagra98 Jun 13, 2023
0fd52c8
Merge pull request #5837 from Shubham82/update_vpa_script
k8s-ci-robot Jun 14, 2023
6fcc5fd
Merge pull request #5843 from kubernetes/dependabot/docker/vertical-p…
k8s-ci-robot Jun 14, 2023
296207c
Drop projected volumes for init containers
azylinski Jun 14, 2023
33b4dcd
fix zonal gce outage breaking CA when only some of the zones are failed
damikag Jun 14, 2023
ec04940
Merge pull request #5852 from azylinski/drop-projected-volumes-for-in…
k8s-ci-robot Jun 15, 2023
e6f6211
Bump version to 0.14.0 as a preparation for release.
kgolab Jun 15, 2023
203654c
Update vendor to Kubernetes 1.28.0-alpha.2
BigDarkClown Jun 15, 2023
3221336
Merge pull request #5860 from kgolab/master
k8s-ci-robot Jun 15, 2023
65eda1c
Interface fixes after Kubernetes 1.28.0-alpha.2 vendor update
BigDarkClown Jun 15, 2023
463d7d6
Merge pull request #5861 from BigDarkClown/update-28-alpha2
k8s-ci-robot Jun 15, 2023
1944c82
Execute git commands to show the state of local clone of the repo.
kgolab Jun 15, 2023
72e67a7
Clarify and simplify the "build and stage images" step.
kgolab Jun 15, 2023
d7fa2d9
Merge pull request #5863 from kgolab/vpa-release-doc
k8s-ci-robot Jun 15, 2023
231868e
Merge pull request #5862 from kgolab/makefile-status
k8s-ci-robot Jun 15, 2023
ac9b589
Mention logs from #5862 in release instructions.
kgolab Jun 16, 2023
7a64653
Merge pull request #5853 from damikag/fix-zonal-gce-outage-breaking-ca
k8s-ci-robot Jun 16, 2023
2bd9016
Merge pull request #5866 from kgolab/master
k8s-ci-robot Jun 16, 2023
541affc
addressed comments
kushagra98 Jun 16, 2023
c7ddd5a
Merge pull request #5810 from kushagra98/master
k8s-ci-robot Jun 16, 2023
1107280
chore: remove unused func scaleFromZeroAnnotationsEnabled
dineshba Jun 17, 2023
4e964b7
add cluster-autoscaler name and version to the user agent
tzneal Jun 20, 2023
e84f770
Explicitly create and remove buildx builders
voelzmo Jun 16, 2023
7bc37f7
Merge pull request #5867 from voelzmo/fix/create-and-remove-buildx-bu…
k8s-ci-robot Jun 20, 2023
b91b127
Merge pull request #5873 from tzneal/make-user-agent-more-informative
k8s-ci-robot Jun 20, 2023
120bd68
Apply fixes to in place support VPA AEP
jbartosik Jun 21, 2023
9621400
Add voelzmo to VPA reviewers
jbartosik Jun 22, 2023
96cdd88
Bump default VPA version to 0.14.0
kgolab Jun 22, 2023
3b17e5e
Merge pull request #5880 from kgolab/vpa-014
k8s-ci-robot Jun 22, 2023
e08d969
Minor tweaks after preparing VPA 0.14.0 release.
kgolab Jun 21, 2023
2727a20
Merge pull request #5877 from jbartosik/in-place-kep-follow-up
k8s-ci-robot Jun 23, 2023
d17b1aa
Merge pull request #5879 from jbartosik/voelzmo-reviewer
k8s-ci-robot Jun 26, 2023
c51dc16
Merge pull request #5878 from kgolab/master
k8s-ci-robot Jun 26, 2023
0733535
Merge pull request #5868 from dineshba/remove-unused-func-scaleFromZe…
k8s-ci-robot Jun 26, 2023
3248bf3
Merge pull request #5764 from brightbox/brightbox-provider-additions
k8s-ci-robot Jun 26, 2023
265e57c
Merge pull request #5835 from elmiko/add-more-balance-logging
k8s-ci-robot Jun 26, 2023
adde651
fix: CA on fargate causing log flood
vadasambar Jun 23, 2023
ec01059
test: fix node names
vadasambar Jun 23, 2023
2aaa41d
Merge pull request #5766 from wu0407/add-status-subresource
k8s-ci-robot Jun 27, 2023
e291b71
Sort nodegroups in order of their ID
kushagra98 Jun 27, 2023
1847875
Move two util functions from actuator to delete_in_batch, where they …
kawych May 31, 2023
61b7958
Add support for atomic scale-down in node group options
kawych May 31, 2023
292a517
Extract cropNodesToBudgets function out of actuator file
kawych May 31, 2023
7e3e15b
Support atomic scale-down option for node groups
kawych May 31, 2023
1ecb843
Respond to readability-related comments from the review
kawych Jun 21, 2023
257e66c
Don't pass NodeGroup as a parameter to functions running asynchronously
kawych Jun 21, 2023
5448bd2
Add unit test for group_deletion_scheduler
kawych Jun 21, 2023
05e1fe1
Use single AtomicScaling option for scale up and scale down
kawych Jun 26, 2023
072317f
address comments
kushagra98 Jun 27, 2023
374cf61
Address next set of comments
kawych Jun 27, 2023
1dd1e3d
Merge pull request #5893 from kushagra98/master
k8s-ci-robot Jun 27, 2023
76a7d21
update agnhost image to pull from registry.k8s.io
kwiesmueller Jun 27, 2023
b24a41d
Revert "Add subresource status for vpa"
jbartosik Jun 28, 2023
1c8ec5c
Merge pull request #5897 from jbartosik/revert-subresource
jbartosik Jun 28, 2023
753b024
Bugfix for budget cropping
kawych Jun 28, 2023
72c0a7e
Remove unneeded node groups regardless of scale down being in cooldown.
olagacek Jun 28, 2023
03b45e2
Merge pull request #5895 from kwiesmueller/e2e-todo-1
k8s-ci-robot Jun 28, 2023
0e7d327
Update VPA vendor
jbartosik Jun 28, 2023
db507cd
Replace `BuildTestContainer` with use of builder
kwiesmueller Jun 27, 2023
5a23f69
Merge pull request #5904 from jbartosik/update-vpa-vendor
k8s-ci-robot Jun 28, 2023
8f83f7e
Merge pull request #5896 from kwiesmueller/test-util-builder-cleanup
k8s-ci-robot Jun 28, 2023
d3a1f4f
Quote temp folder name parameter to avoid errors
voelzmo Jun 29, 2023
67d3e7e
Include short unregistered nodes in calculation of incorrect node group
BigDarkClown Jun 27, 2023
973f9fd
Merge pull request #5894 from BigDarkClown/fix-cs
k8s-ci-robot Jun 29, 2023
4c55b17
Merge pull request #5695 from kawych/tpu
k8s-ci-robot Jun 30, 2023
6c7ae1a
Add BigDarkClown to Cluster Autoscaler approvers
BigDarkClown Jun 29, 2023
0dd63f1
Merge pull request #5915 from BigDarkClown/be-approver
k8s-ci-robot Jun 30, 2023
c887626
Merge pull request #5913 from voelzmo/fix/generate-crds-with-weird-te…
k8s-ci-robot Jun 30, 2023
7b8e0e6
Add support for scaling up ZeroToMaxNodesScaling node groups
hbostan May 31, 2023
c255aaa
Use appropriate logging levels
hbostan May 31, 2023
38d18c6
Remove unused field in expander and add comment about estimator
hbostan Jun 1, 2023
79c611c
Merge tests for ZeroToMaxNodesScaling into one table-driven test.
hbostan Jun 26, 2023
8ba34ea
Change handling of scale up options for ZeroToMaxNodeScaling in orche…
hbostan Jun 26, 2023
333a028
Rename the autoscaling option
hbostan Jun 29, 2023
e6397c6
Merge pull request #5826 from hbostan/master
k8s-ci-robot Jun 30, 2023
d6e016f
Record all vpa api versions in recommender metrics
kwiesmueller Jun 15, 2023
1f342ff
Add subresource status for vpa
wu0407 Jun 29, 2023
3eacc05
Merge pull request #5813 from qianlei90/feat-add-metrics
k8s-ci-robot Jul 3, 2023
136976e
Merge pull request #5864 from kwiesmueller/master
k8s-ci-robot Jul 3, 2023
3c32e77
Merge pull request #5911 from wu0407/add-status-subresource
k8s-ci-robot Jul 3, 2023
a947ec1
Implement threshold interface for use by threshold based limiter
ystryuchkov Jun 30, 2023
7c293ae
Merge branch 'master' into feature/runtime-limits
ystryuchkov Jul 3, 2023
0d0e3fc
Fix tests
ystryuchkov Jul 3, 2023
adb16c8
Merge pull request #5901 from olagacek/master
k8s-ci-robot Jul 4, 2023
363fd16
Merge branch 'kubernetes:master' into feature/runtime-limits
ystryuchkov Jul 4, 2023
5fed449
Add ClusterStateRegistry to the AutoscalingContext.
kisieland Jun 28, 2023
f3dfeee
Make signature of GetDurationLimit uniformed with GetNodeLimit
ystryuchkov Jul 4, 2023
b4213d8
Add support for negative binpacking duration limit in threshold based…
ystryuchkov Jul 5, 2023
2779677
Merge pull request #5917 from ystryuchkov/feature/runtime-limits
k8s-ci-robot Jul 5, 2023
4606cdf
Merge pull request #5905 from kisieland/auto-context
k8s-ci-robot Jul 5, 2023
481a733
update RBAC to only use verbs that exist for the resources
MaxRink Jul 5, 2023
dbff9be
Move powerState to azure_util, change default to powerStateUnknown
domenicbozzuto Jul 5, 2023
b569db4
Merge pull request #5767 from DataDog/bugfix-azure-prevent-unneeded-s…
k8s-ci-robot Jul 5, 2023
8de66e1
test: fix failing tests
vadasambar Jul 6, 2023
7941bab
feat: set `IgnoreDaemonSetsUtilization` per nodegroup
vadasambar Apr 10, 2023
7eb0910
test: fix merge conflicts in actuator tests
vadasambar Jul 5, 2023
eff7888
refactor: use `actuatorNodeGroupConfigGetter` param in `NewActuator`
vadasambar Jul 5, 2023
e1a22da
test: refactor eligibility tests
vadasambar Jul 5, 2023
8a73d8e
refactor: remove comment line (not relevant anymore)
vadasambar Jul 6, 2023
1e45078
fix: dynamic assignment of the scale down threshold flags. Setting ma…
Jun 26, 2023
0f8502c
Refactor autoscaler.go and static_autoscalar.go to move declaration o…
damikag Jul 6, 2023
1ce7553
Merge pull request #5887 from vadasambar/fix/5842/fargate-nodes-causi…
k8s-ci-robot Jul 10, 2023
d7fb388
Merge pull request #5933 from damikag/fix-bsp
k8s-ci-robot Jul 11, 2023
4e3e7c6
Fixed go:build tags for ovhcloud
Shubham82 Jul 12, 2023
18a7330
Update the go:build tag for missing cloud providers.
Shubham82 Jul 12, 2023
7c716dd
Adapt FAQ for Pods without controller
voelzmo Jul 12, 2023
c6893e9
Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-…
k8s-ci-robot Jul 12, 2023
3cad43a
Use strings instead of NodeGroups as map keys in budgets.go
kawych Jul 12, 2023
a23a8da
Delete dead code from budgets.go
kawych Jul 12, 2023
24194b2
Re-introduce asynchronous node deletion and clean node deletion logic.
kawych Jul 11, 2023
da96d89
Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-b…
k8s-ci-robot Jul 12, 2023
23f03e1
feat: support custom scheduler config for in-tree schedulr plugins (w…
vadasambar Apr 25, 2023
ec783d2
Merge pull request #5945 from kawych/tpu
k8s-ci-robot Jul 13, 2023
eb2accb
Use fixed version of golang image
krzysied Jul 14, 2023
dd951b0
Fix TestBinpackingLimiter flake
BigDarkClown Jul 14, 2023
38c2e89
Merge pull request #5954 from krzysied/master_golang_fix
k8s-ci-robot Jul 14, 2023
75c698c
Bump golang from 1.20.5 to 1.20.6 in /vertical-pod-autoscaler/builder
dependabot[bot] Jul 14, 2023
4d7ba81
Merge pull request #5956 from BigDarkClown/fix-conc
k8s-ci-robot Jul 14, 2023
e5bc070
Fix: Do not inject fakeNode for instance which has errors on create
azylinski Jul 10, 2023
f7acf94
Merge pull request #5939 from azylinski/fix-do-not-inject-fakeNode-fo…
k8s-ci-robot Jul 17, 2023
8d39bae
chore: add script to update vendored hcloud-go
apricote Jul 17, 2023
ab0096f
chore(deps): update vendored hcloud-go to 2.0.0
apricote Jul 17, 2023
686900b
Merge pull request #5838 from vadasambar/fix/4097/critical-addons-tai…
k8s-ci-robot Jul 17, 2023
9fc1b81
fix: balancer RBAC permission to update balancer status
a7i Jul 18, 2023
9acd722
CA - AWS Cloudprovider OWNERS Update
gjtempleton Jul 18, 2023
e87f793
Merge pull request #5944 from voelzmo/enh/doc-for-pods-without-contro…
k8s-ci-robot Jul 19, 2023
171021c
Merge pull request #5708 from vadasambar/feat/5106/support-custom-sch…
k8s-ci-robot Jul 20, 2023
a19caa2
Merge pull request #5971 from gjtempleton/CA-AWS-Maintainers-Update
k8s-ci-robot Jul 21, 2023
990cd65
Enable parallel drain by default.
x13n Jul 21, 2023
157c68f
Merge pull request #5976 from x13n/parallel-drain
k8s-ci-robot Jul 21, 2023
fb1a4c2
Merge pull request #5948 from kubernetes/dependabot/docker/vertical-p…
k8s-ci-robot Jul 24, 2023
e1644f9
Add BigDarkClown to patch releases schedule
BigDarkClown Jul 24, 2023
d077937
Merge pull request #5978 from BigDarkClown/change-release-schedule
k8s-ci-robot Jul 24, 2023
9c364f9
Update Cluster Autoscaler vendor to K8s 1.28.0-beta.0
BigDarkClown Jul 24, 2023
21229d3
Add EstimationAnalyserFunc to be run at the end of the estimation logic
azylinski Jul 20, 2023
954d1a0
Merge pull request #5943 from Shubham82/fix-build
k8s-ci-robot Jul 25, 2023
5c22eb7
Merge pull request #5941 from Shubham82/fix_ovhcloud_build_file
k8s-ci-robot Jul 25, 2023
8c749fc
Merge pull request #5980 from BigDarkClown/vendor
k8s-ci-robot Jul 25, 2023
734c268
Remove ChangeRequirements with `OrEqual`
voelzmo Jul 25, 2023
2a0dea3
Add EvictionRequirements to types
voelzmo Sep 7, 2022
1b35940
Run `generate-crd-yaml.sh`
voelzmo Mar 15, 2023
2eba540
Add metrics for improved observability:
kawych Jul 14, 2023
63eab4e
Merge pull request #5974 from azylinski/add-estimator-PostPackingFunc
k8s-ci-robot Jul 26, 2023
ac092a9
Add requirement for Custom Resources to VPA FAQ
voelzmo Jul 24, 2023
b0ddb47
Clarify Eviction Control for Pods with multiple Containers
voelzmo Jul 26, 2023
895ad14
Merge pull request #5981 from voelzmo/enh/eviction-control-remove-equ…
k8s-ci-robot Jul 27, 2023
5b129d1
Merge pull request #5970 from kawych/tpu
k8s-ci-robot Jul 27, 2023
66b56c5
Merge pull request #5989 from voelzmo/enh/eviction-control-aep-for-mu…
k8s-ci-robot Jul 27, 2023
ecfdc21
Fix broken hyperlink
droctothorpe Jul 26, 2023
66aa3bd
Update vertical-pod-autoscaler/FAQ.md
voelzmo Jul 28, 2023
fc575c6
Update vertical-pod-autoscaler/FAQ.md
voelzmo Jul 28, 2023
5d7d337
Reword AND/OR combinations for more clarity
voelzmo Jul 28, 2023
e777d79
Fix nil pointer exception for case when node is nil while processing …
jayantjain93 Aug 1, 2023
695aacf
feat: add prometheus basic auth
TessaIO Aug 1, 2023
95990f1
Add error code for invalid reservations to GCE client
hbostan Aug 1, 2023
702e968
Merge pull request #6006 from hbostan/master
k8s-ci-robot Aug 2, 2023
3efb678
Bump golang from 1.20.6 to 1.20.7 in /vertical-pod-autoscaler/builder
dependabot[bot] Aug 2, 2023
80053f6
Support ZeroOrMaxNodeScaling node groups when cleaning up unregistere…
kawych Jul 27, 2023
65342dd
Merge pull request #6009 from kubernetes/dependabot/docker/vertical-p…
k8s-ci-robot Aug 3, 2023
3b44c10
Merge pull request #6002 from kawych/down
k8s-ci-robot Aug 3, 2023
8e621b2
Don't pass nil nodes to GetGpuInfoForMetrics
kawych Aug 4, 2023
d1ad9fa
Merge pull request #6013 from kawych/fixing
k8s-ci-robot Aug 4, 2023
dd4263c
Merge pull request #6003 from jayantjain93/gpu-nil-pointer
k8s-ci-robot Aug 4, 2023
71dca88
Merge pull request #5964 from a7i/balancer-rbac-update-status
k8s-ci-robot Aug 4, 2023
c9b8eee
Merge pull request #5961 from hetznercloud/hcloud-go-v2
k8s-ci-robot Aug 4, 2023
b8d3009
Merge pull request #5979 from voelzmo/enh/doc-vpa-for-custom-resources
k8s-ci-robot Aug 4, 2023
bee7c0f
Merge pull request #5987 from droctothorpe/typos
k8s-ci-robot Aug 4, 2023
76b20e4
Revert "Fix nil pointer exception for case when node is nil while pro…
jayantjain93 Aug 4, 2023
9adef5a
Merge pull request #6014 from jayantjain93/revert-6003-gpu-nil-pointer
k8s-ci-robot Aug 4, 2023
e39d1b0
Clean up NodeGroupConfigProcessor interface
BigDarkClown Aug 3, 2023
c4e1681
docs: add kep to add fswatcher to nanny for automatic nanny configura…
TessaIO Aug 4, 2023
555fa4d
Allow using an external secret instead of using the one the Helm char…
mtougeron Aug 4, 2023
14655d2
Remove the MaxNodeProvisioningTimeProvider interface
BigDarkClown Aug 3, 2023
a3bcd98
Merge pull request #5176 from voelzmo/enh/add-eviction-requirements
k8s-ci-robot Aug 7, 2023
8ea13fe
Fixed the hyperlink for Node group auto discovery.
Shubham82 Aug 8, 2023
407f4bc
Update ResourcePolicy description and limit control README
sachintiptur Aug 8, 2023
3c38918
s390x image support
Saripalli-lavanya Aug 7, 2023
fc5870f
Merge pull request #6011 from BigDarkClown/cleanup
k8s-ci-robot Aug 9, 2023
513f962
Bump golang from 1.20.7 to 1.21.0 in /vertical-pod-autoscaler/builder
dependabot[bot] Aug 9, 2023
893a51b
Merge pull request #6030 from kubernetes/dependabot/docker/vertical-p…
k8s-ci-robot Aug 10, 2023
21223f3
Merge pull request #6026 from sachintiptur/resourcepolicy_comment_update
k8s-ci-robot Aug 10, 2023
6b20ca0
Merge pull request #6022 from Shubham82/fix-hyperlink-node_group_auto…
k8s-ci-robot Aug 10, 2023
1aaf4d0
Merge pull request #6020 from Saripalli-lavanya/sl-s390x
k8s-ci-robot Aug 10, 2023
d5757e1
test
aleksandra-malinowska Aug 10, 2023
76a1cef
Set batch size to target size for atomically scaled groups
aleksandra-malinowska Aug 10, 2023
57df63d
a little extra validation
aleksandra-malinowska Aug 10, 2023
aeaab27
test with 2 atomic groups
aleksandra-malinowska Aug 10, 2023
e1cc8ff
don't block draining other groups when one group has some empty nodes
aleksandra-malinowska Aug 10, 2023
f3ca6ef
Merge pull request #6034 from aleksandra-malinowska/fix-atomic
k8s-ci-robot Aug 11, 2023
e1b03fa
Merge pull request #6005 from AhmedGrati/feat-vpa-add-prometheus-basi…
k8s-ci-robot Aug 11, 2023
0ed75e4
fix: Broken links to testgrid dashboard
khareyash05 Aug 15, 2023
8313492
fix: scale down broken for providers not implementing NodeGroup.GetOp…
apricote Aug 15, 2023
7b12a78
feat(hetzner): use less requests while waiting for server create
apricote Aug 15, 2023
44820bd
Update in-place updates AEP adding details to consider
pbetkier Aug 16, 2023
f956800
Fix Doc with External gRPC
j13tw Aug 17, 2023
bb7c8a1
Merge pull request #6038 from hetznercloud/fix-get-options-calls
k8s-ci-robot Aug 18, 2023
3c9aecb
Add fetch reservations in specific project
jessicaochen Aug 18, 2023
5389947
Merge pull request #6051 from jessicaochen/master
k8s-ci-robot Aug 21, 2023
1f9e70e
Merge pull request #6044 from pbetkier/pbetkier-in-place-aep-consider
k8s-ci-robot Aug 22, 2023
58eed48
Merge pull request #6047 from j13tw/fix-doc
k8s-ci-robot Aug 22, 2023
5e20c44
Merge pull request #6016 from mtougeron/flexible-secret-name
k8s-ci-robot Aug 22, 2023
199570e
Merge pull request #6036 from khareyash05/docs
k8s-ci-robot Aug 23, 2023
533719c
Merge pull request #6039 from hetznercloud/hetzner-exponential-backoff
k8s-ci-robot Aug 23, 2023
6a207c8
kep: add config file format and structure notes
TessaIO Aug 23, 2023
0874fe6
Merge pull request #6015 from AhmedGrati/addon-resizer-kep-proposal-5700
k8s-ci-robot Aug 24, 2023
57d07df
Merge pull request #5927 from MaxRink/rbac
k8s-ci-robot Aug 24, 2023
cb95d46
CA - 1.28.0 k/k Vendor Update
gjtempleton Aug 24, 2023
5bcb526
Merge pull request #6058 from gjtempleton/cluster-autoscaler-release-…
k8s-ci-robot Aug 28, 2023
232385b
Merge branch 'upstream-release-1.28.0' into sync-upstream-v1.28.0
aaronfern Nov 21, 2023
42e7be6
Merge branch 'machine-controller-manager-provider' into sync-upstream…
aaronfern Dec 11, 2023
fdf82e8
Fix duplicate imports in IT
aaronfern Dec 18, 2023
7bfb2bf
re-add changes part of FORK-CHANGE
aaronfern Jan 22, 2024
db5a3ac
Re added a fork change command and updated sync change notes
aaronfern Jan 26, 2024
bd82a74
Update cluster-autoscaler/SYNC-CHANGES/SYNC_CHANGES-1.28.md
aaronfern Jan 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# KEP-5546: Scaling based on container count

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

## Summary

Currently Addon Resizer supports scaling based on the number of nodes. Some workloads use resources proportionally to
the number of containers in the cluster. Since number of containers per node is very different in different clusters
it's more resource-efficient to scale such workloads based directly on the container count.

### Goals

- Allow scaling workloads based on count of containers in a cluster.
- Allow this for Addon Resizer 1.8 ([used by metrics server]).

### Non-Goals

- Using both node and container count to scale workloads.
- Bringing this change to the `master` branch of Addon Resizer.

## Proposal

Add flag `--scaling-mode` to Addon Resizer on the [`addon-resizer-release-1.8`] branch. Flag will
have two valid values:

- `node-proportional` - default, current behavior.
- `container-proportional` - addon resizer will set resources, using the same algorithm it's using now but using number
of containers where it's currently using number of nodes.

### Notes

Addon Resizer 1.8 assumes in multiple places that it's scaling based on the number of nodes:

- [Flag descriptions] that directly reference node counts (`--extra-cpu`, `--extra-memory`, `--extra-storage`, and
`--minClusterSize`) will need to be updated to instead refer to cluster size.
- [README] will need to be updated to reference cluster size instead of node count and explain that cluster size refers
to either node count or container count, depending on the value of the `--scaling-mode` flag.
- Many variable names in code which now refer to node count will refer to cluster size and should be renamed accordingly.

In addition to implementing the feature we should also clean up the code and documentation.

### Risks and Mitigations

One potential risk is that Addon resizer can obtain cluster size (node count or container count):
- from metrics or
- by querying Cluster Api Server to list all objects of the appropriate type

depending on the configuration. There can be many times more containers in a cluster that there are nodes. So listing
all containers could result in higher load on the Cluster API server. Since Addon Resizer is requesting very few fields
I don't expect this effect to be noticeable.

Also I expect metrics-server to test for this before using the feature and any other users of Addon Resizer are likely
better off using metrics (which don't have this problem).

## Design Details

- Implement function `kubernetesClient.CountContainers()`. It will be analogous to the existing
[`kubernetesClient.CountNodes()`] function.
- If using metrics to determine number of containers in the cluster:
- Fetch pod metrics (similar to [fetching node metrics] but use `/pods` URI instead of `/nodes`).
- For each pod obtain number of containers (length of the `containers` field).
- Sum container counts for all pods.
- If using API server:
- Fetch list pods (similar to [listing nodes])
- Fetch only [`Spec.InitContainers`], [`Spec.Containers`], and [`Spec.EphemeralContainers`] fields.
- Exclude pods in terminal states ([selector excluding pods in terminal states in VPA])
- Sum container count over pods.
- Add the `--scaling-mode` flag, with two valid values:
- `node-proportional` - default, current behavior, scaling based on clusters node count and
- `container-proportional` - new behavior, scaling based on clusters container count
- Pass value indicating if we should use node count or container count to the [`updateResources()`] function.
- In `updateResources()` use node count or container count, depending on the value.

Check that listing containers directly works

Coinsider listing pods, getting containers only for working pods

### Test Plan

In addition to unit tests we will run manual e2e test:

- Create config based on [`example.yaml`] but scaling the deployment based on the number of containers in the cluster.
- Create config starting deployment with 100 `pause` containers.

Test the feature by:

- Starting the deployment scaled by Addon Resizer, based on node count.
- Observe size of the deployment and that it's stable.
- Start deployment with 100 `pause` containers.
- Observe the scaled deployment change resources appropriately.

Test the node-based scaling:

- Apply [`example.yaml`].
- Observe amount and stability assigned resources.
- Resize cluster.
- Observe change in assigned resources.

Both tests should be performed with metrics- and API- based scaling.

[used by metrics server]: https://github.com/kubernetes-sigs/metrics-server/blob/0c47555e9b49cfe0719db1a0b7fb6c8dcdff3d38/charts/metrics-server/values.yaml#L121
[`addon-resizer-release-1.8`]: https://github.com/kubernetes/autoscaler/tree/addon-resizer-release-1.8
[Flag descriptions]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/main/pod_nanny.go#L47
[README]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/README.md?plain=1#L1
[`kubernetesClient.CountNodes()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L58
[fetching node metrics]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L150
[listing nodes]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/kubernetes_client.go#L71
[`Spec.InitContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3143
[`Spec.Containers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3150
[`Spec.EphemeralContainers`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L3158
[`Status.Phase`]: https://github.com/kubernetes/api/blob/1528256abbdf8ff2510112b28a6aacd239789a36/core/v1/types.go#L4011
[selector excluding pods in terminal states in VPA]: https://github.com/kubernetes/autoscaler/blob/04e5bfc88363b4af9fdeb9dfd06c362ec5831f51/vertical-pod-autoscaler/e2e/v1beta2/common.go#L195
[`updateResources()`]: https://github.com/kubernetes/autoscaler/blob/da500188188d275a382be578ad3d0a758c3a170f/addon-resizer/nanny/nanny_lib.go#L126
[`example.yaml`]: https://github.com/kubernetes/autoscaler/blob/c8d612725c4f186d5de205ed0114f21540a8ed39/addon-resizer/deploy/example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# KEP-5546: Automatic reload of nanny configuration when updated

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Notes](#notes)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
<!-- /toc -->

Sure, here's the enhancement proposal in the requested format:

## Summary
- **Goals:** The goal of this enhancement is to improve the user experience for applying nanny configuration changes in the addon-resizer 1.8 when used with the metrics server. The proposed solution involves automatically reloading the nanny configuration whenever changes occur, eliminating the need for manual intervention and sidecar containers.
- **Non-Goals:** This proposal does not aim to update the functional behavior of the addon-resizer.

## Proposal
The proposed solution involves updating the addon-resizer with the following steps:
- Create a file system watcher using `fsnotify` under `utils/fswatcher` to watch nanny configurations' changes. It should run as a goroutine in the background.
- Detect changes of the nanny configurations' file using the created `fswatcher` trigger the reloading process when configuration changes are detected. Events should be sent in a channel.
- Re-execute the method responsible for building the NannyConfiguration `loadNannyConfiguration` to apply the updated configuration to the addon-resizer.
- Proper error handling should be implemented to manage scenarios where the configuration file is temporarily inaccessible or if there are parsing errors in the configuration file.

### Risks and Mitigations
- There is a potential risk of filesystem-related issues causing the file watcher to malfunction. Proper testing and error handling should be implemented to handle such scenarios gracefully.
- Errors in the configuration file could lead to unexpected behavior or crashes. The addon-resizer should handle parsing errors and fall back to the previous working configuration if necessary.

## Design Details
- Create a new package for the `fswatcher` under `utils/fswatcher`. It would contain the `fswatcher` struct and methods and unit-tests.
- `FsWatcher` struct would look similar to this:
```go
type FsWatcher struct {
*fsnotify.Watcher

Events chan struct{}
ratelimit time.Duration
names []string
paths map[string]struct{}
}
```
- Implement the following functions:
- `CreateFsWatcher`: Instantiates a new `FsWatcher` and start watching on file system.
- `initWatcher`: Initializes the `fsnotify` watcher and initialize the `paths` that would be watched.
- `add`: Adds a new file to watch.
- `reset`: Re-initializes the `FsWatcher`.
- `watch`: watches for the configured files.
- In the main function, we create a new `FsWatcher` and then we wait in an infinite loop to receive events indicating
filesystem changes. Based on these changes, we re-execute `loadNannyConfiguration` function.

> **Note:** The expected configuration file format is YAML. It has the same structure as the NannyConfiguration CRD.

### Test Plan
To ensure the proper functioning of the enhanced addon-resizer, the following test plan should be executed:
1. **Unit Tests:** Write unit tests to validate the file watcher's functionality and ensure it triggers events when the configuration file changes.
2. **Manual e2e Tests:** Deploy the addon-resizer with `BaseMemory` of `300Mi` and then we change the `BaseMemory` to `100Mi`. We should observer changes in the behavior of watched pod.


[fsnotify]: https://github.com/fsnotify/fsnotify
6 changes: 6 additions & 0 deletions balancer/deploy/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ rules:
- watch
- patch
- update
- apiGroups:
- balancer.x-k8s.io
resources:
- balancers/status
verbs:
- update
- apiGroups:
- ""
resources:
Expand Down
4 changes: 2 additions & 2 deletions balancer/proposals/balancer.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ These domains may include:
* Cloud provider zones inside a single region, to ensure that the application is still up and running, even if one of the zones has issues.
* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families.

A single Kuberentes deployment may either leave the placement entirely up to the scheduler
A single Kubernetes deployment may either leave the placement entirely up to the scheduler
(most likely leading to something not entirely desired, like all pods going to a single domain) or
focus on a single domain (thus not achieving the goal of being in two or more domains).

Expand Down Expand Up @@ -179,4 +179,4 @@ type BalancerStatus struct {
// +patchStrategy=merge
Conditions []metav1.Condition
}
```
```
2 changes: 1 addition & 1 deletion builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

FROM golang:1.20
FROM golang:1.20.4
LABEL maintainer="Marcin Wielgus <[email protected]>"

ENV GOPATH /gopath/
Expand Down
4 changes: 2 additions & 2 deletions charts/cluster-autoscaler/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
apiVersion: v2
appVersion: 1.26.2
appVersion: 1.27.2
description: Scales Kubernetes worker nodes within autoscaling groups.
engine: gotpl
home: https://github.com/kubernetes/autoscaler
Expand All @@ -11,4 +11,4 @@ name: cluster-autoscaler
sources:
- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
type: application
version: 9.28.0
version: 9.29.2
8 changes: 6 additions & 2 deletions charts/cluster-autoscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _

### VerticalPodAutoscaler

The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.

The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:

Expand Down Expand Up @@ -383,7 +386,7 @@ vpa:
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy |
| image.pullSecrets | list | `[]` | Image pull secrets |
| image.repository | string | `"registry.k8s.io/autoscaling/cluster-autoscaler"` | Image repository |
| image.tag | string | `"v1.26.2"` | Image tag |
| image.tag | string | `"v1.27.2"` | Image tag |
| kubeTargetVersionOverride | string | `""` | Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands. |
| magnumCABundlePath | string | `"/etc/kubernetes/ca-bundle.crt"` | Path to the host's CA bundle, from `ca-file` in the cloud-config file. |
| magnumClusterName | string | `""` | Cluster name or ID in Magnum. Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`. |
Expand All @@ -408,6 +411,7 @@ vpa:
| rbac.serviceAccount.name | string | `""` | The name of the ServiceAccount to use. If not set and create is `true`, a name is generated using the fullname template. |
| replicaCount | int | `1` | Desired number of pods |
| resources | object | `{}` | Pod resource requests and limits. |
| secretKeyRefNameOverride | string | `""` | Overrides the name of the Secret to use when loading the secretKeyRef for AWS and Azure env variables |
| securityContext | object | `{}` | [Security context for pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) |
| service.annotations | object | `{}` | Annotations to add to service |
| service.create | bool | `true` | If `true`, a Service will be created. |
Expand Down
5 changes: 4 additions & 1 deletion charts/cluster-autoscaler/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,10 @@ Though enough for the majority of installations, the default PodSecurityPolicy _

### VerticalPodAutoscaler

The chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) for the Deployment if needed. A VPA can help minimize wasted resources when usage spikes periodically or remediate containers that are being OOMKilled.
The CA Helm Chart can install a [`VerticalPodAutoscaler`](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/README.md) object from Chart version `9.27.0`
onwards for the Cluster Autoscaler Deployment to scale the CA as appropriate, but for that, we
need to install the VPA to the cluster separately. A VPA can help minimize wasted resources
when usage spikes periodically or remediate containers that are being OOMKilled.

The following example snippet can be used to install VPA that allows scaling down from the default recommendations of the deployment template:

Expand Down
11 changes: 10 additions & 1 deletion charts/cluster-autoscaler/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,13 +151,22 @@ rules:
- cluster.x-k8s.io
resources:
- machinedeployments
- machinedeployments/scale
- machinepools
- machines
- machinesets
verbs:
- get
- list
- update
- watch
- apiGroups:
- cluster.x-k8s.io
resources:
- machinedeployments/scale
- machinepools/scale
verbs:
- get
- patch
- update
{{- end }}
{{- end -}}
20 changes: 10 additions & 10 deletions charts/cluster-autoscaler/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,36 +132,36 @@ spec:
valueFrom:
secretKeyRef:
key: AwsAccessKeyId
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- if .Values.awsSecretAccessKey }}
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AwsSecretAccessKey
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- else if eq .Values.cloudProvider "azure" }}
- name: ARM_SUBSCRIPTION_ID
valueFrom:
secretKeyRef:
key: SubscriptionID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: ResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_VM_TYPE
valueFrom:
secretKeyRef:
key: VMType
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_CLUSTER_NAME
valueFrom:
secretKeyRef:
key: ClusterName
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- if .Values.azureUseWorkloadIdentityExtension }}
- name: ARM_USE_WORKLOAD_IDENTITY_EXTENSION
value: "true"
Expand All @@ -173,22 +173,22 @@ spec:
valueFrom:
secretKeyRef:
key: TenantID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_ID
valueFrom:
secretKeyRef:
key: ClientID
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: ARM_CLIENT_SECRET
valueFrom:
secretKeyRef:
key: ClientSecret
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
- name: AZURE_NODE_RESOURCE_GROUP
valueFrom:
secretKeyRef:
key: NodeResourceGroup
name: {{ template "cluster-autoscaler.fullname" . }}
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- end }}
{{- end }}
{{- range $key, $value := .Values.extraEnv }}
Expand Down
Loading
Loading