Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-scheduler crashed after 300K pods in 10K nodes cluter #607

Open
sonyafenge opened this issue Aug 19, 2020 · 2 comments
Open

kube-scheduler crashed after 300K pods in 10K nodes cluter #607

sonyafenge opened this issue Aug 19, 2020 · 2 comments
Milestone

Comments

@sonyafenge
Copy link
Collaborator

What happened:
Start cluster with 10K nodes and started to create pods for this cluster. after 300K pods, kubee-scheduler crahed.

What you expected to happen:
kube-scheduler not crash
How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Arktos version (use kubectl version):
commit 04298acdc2ca682bd54306aaa6fd816ba3018e57
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@sonyafenge
Copy link
Collaborator Author

from Sindica:

Here looks like the scheduler restart log:

25304 I0819 01:49:15.482326 1 cache.go:666] Couldn't expire cache for pod system/os8klemekk67w6j2xx68t2n7i9nwzqmx-ns/os8klemekk67w6j2xx68t2n7i9nwzqmx-pods-6744f97b99-jgqjn. Binding is still in progress.
25305 I0819 01:49:15.678020 1 leaderelection.go:281] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
25306 E0819 01:49:15.678062 1 server.go:258] lost master
25307 lost lease
25308 I0819 01:49:19.666685 1 feature_gate.go:216] feature gates: &{map[ExperimentalCriticalPodAnnotation:true]}

@sonyafenge
Copy link
Collaborator Author

sonyafenge commented Aug 19, 2020

issue first reported in this issue:
https://github.com/futurewei-cloud/arktos/issues/571

@sonyafenge sonyafenge added this to the 930 milestone Sep 1, 2020
@zmn223 zmn223 modified the milestones: 930, 1130 Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants