Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 nodes load run finished with error: DaemonSets timeout #1007

Open
sonyafenge opened this issue Mar 2, 2021 · 0 comments
Open

500 nodes load run finished with error: DaemonSets timeout #1007

sonyafenge opened this issue Mar 2, 2021 · 0 comments
Assignees

Comments

@sonyafenge
Copy link
Collaborator

What happened:
started 500 nodes * 2TP load testing, get error for TP2:

E0227 04:01:12.283559   12850 clusterloader.go:213] Test Finished
E0227 04:01:12.283566   12850 clusterloader.go:214]   Test: testing/load/config.yaml
E0227 04:01:12.283571   12850 clusterloader.go:215]   Status: Fail
E0227 04:01:12.283576   12850 clusterloader.go:217]   Errors: [measurement call WaitForControlledPodsRunning - WaitForRunningDaemonSets error: 5 objects timed out: DaemonSets: zeta/iggvn7-testns/daemonset-0, zeta/0gzkyn-testns/daemonset-0, zeta/fcac84-testns/daemonset-0, zeta/b289mk-testns/daemonset-0, zeta/thn07g-testns/daemonset-0
measurement call WaitForControlledPodsRunning - WaitForRunningDaemonSets error: 5 objects timed out: DaemonSets: zeta/thn07g-testns/daemonset-0, zeta/iggvn7-testns/daemonset-0, zeta/0gzkyn-testns/daemonset-0, zeta/fcac84-testns/daemonset-0, zeta/b289mk-testns/daemonset-0]
E0227 04:01:12.283583   12850 clusterloader.go:219] --------------------------------------------------------------------------------

What you expected to happen:
no timeout error.
How to reproduce it (as minimally and precisely as possible):

sonyali@sonya-uswest2:~/go/src/k8s.io/arktos$ export KUBEMARK_NUM_NODES=1000 NUM_NODES=12 SCALEOUT_TP_COUNT=2
sonyali@sonya-uswest2:~/go/src/k8s.io/arktos$ export MASTER_DISK_SIZE=300GB MASTER_ROOT_DISK_SIZE=300GB KUBE_GCE_ZONE=us-west2-b MASTER_SIZE=n1-highmem-64 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=300GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false ETCD_QUOTA_BACKEND_BYTES=8589934592 SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2 SCALEOUT_CLUSTER=true

sonyali@sonya-uswest2:~/go/src/k8s.io/arktos$ ./cluster/kube-up.sh
sonyali@sonya-uswest2:~/go/src/k8s.io/arktos$ ./test/kubemark/start-kubemark.sh


sonyali@sonya-uswest2:~/go/src/k8s.io/arktos$ SCALEOUT_TEST_TENANT=zeta RUN_NAME=rel022621-2x500 TENANT_PERF_LOG_DIR=/home/sonyali/logs/perf-test/gce-500/arktos/${RUN_NAME}/${SCALEOUT_TEST_TENANT} perf-tests/clusterloader2/run-e2e.sh --nodes=500 --provider=kubemark --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark-proxy --report-dir=${TENANT_PERF_LOG_DIR} --testconfig=testing/density/config.yaml --testconfig=testing/load/config.yaml --testoverrides=./testing/experiments/disable_pvs.yaml > ${TENANT_PERF_LOG_DIR}/perf-run.log  2>&1  &

Anything else we need to know?:
Logs can be found under GCP project: workload-controller-manager on sonya-uswest2: /home/sonyali/logs/perf-test/gce-500/arktos/rel022621-2x500.
Environment:

  • Arktos version (use kubectl version):
32849a616d5 (HEAD, upstream/master) Promote admissionreview to v1 (#998)
103eaa9e749 Scheduler to connect to RP directly via separated client (#991)
2a8a5a5b5e7 fix kubelet permission issue in kube-up (#996)
08b0449c661 Promote admission webhook API to v1 (#981)
15e3e1e71fe (master, kubeletfailedregisternodes) add hack/arktos_cherrypick.sh to ensure cherrypick comments added when cherrypick from kubernetes (#990)
b0500d6dddb fix issue 655 (#992)
d11162827dc Consistent etcd key path (#989)
e679af091ff Use HTTPS as etcd-apiserver protocol when mTLS is enabled (#987)
3ade1409039 Use secure communication for client <--> proxy communication, cleanup unnecessary config files (#986)
1ffad5ca6a0 Read Arktos version from file instead of environment variable (#985)
b1b051adfdd Fix GCE deployment and arktos-up to correctly enable Vertical Scaling feature (#979)
54bb67006c9 Show more detailed message for file-updated-check (#984)
21c6205b999 Openapi addimportalias (#965)
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants