RELEASE 2022 0130 Perf tests Result

[Scale-up]Common Test Setup (specify specific test change below )

build branch: scale-out-poc-2021-0430 - https://github.com/CentaurusInfra/arktos/tree/scale-out-poc-2021-0430
1 Apiserver
1 ETCD instance
WCM disabled
Perf-test tool: https://github.com/CentaurusInfra/arktos/tree/master/perf-tests/clusterloader2
Leader-election disabled
Insecure-port enabled
apiserver pprof debug enabled
Prometheus debug enabled
Env:

export MASTER_DISK_SIZE=200GB MASTER_ROOT_DISK_SIZE=200GB KUBE_GCE_ZONE=us-west2-b MASTER_SIZE=n1-standard-32 NODE_SIZE=n1-standard-16 NUM_NODES=6 NODE_DISK_SIZE=200GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true ETCD_QUOTA_BACKEND_BYTES=8589934592 TEST_CLUSTER_LOG_LEVEL=--v=2 ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false SHARE_PARTITIONSERVER=false  LOGROTATE_FILES_MAX_COUNT=50 LOGROTATE_MAX_SIZE=200M KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBEMARK_NUM_NODES=500 KUBE_GCE_INSTANCE_PREFIX=release43021-500-scaleup KUBE_GCE_NETWORK=release43021-500-scaleup

Cmd:

GOPATH=$HOME/go nohup ./perf-tests/clusterloader2/run-e2e.sh --nodes=500 --provider=kubemark --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark --report-dir=/home/sonyali/logs/perf-test/gce-500/arktos/release43021-500-scaleup --testconfig=testing/density/config.yaml --testconfig=testing/load/config.yaml --testoverrides=./testing/experiments/disable_pvs.yaml

[Scale-out]Common Test Setup (specify specific test change below )

build branch: scale-out-poc-2021-0430 - https://github.com/CentaurusInfra/arktos/tree/scale-out-poc-2021-0430
1 proxy
multi RP masters
multi TP masters
WCM disabled
Perf-test tool: https://github.com/CentaurusInfra/arktos/tree/master/perf-tests/clusterloader2
Leader-election disabled
Insecure-port enabled
apiserver pprof debug enabled
Prometheus debug enabled
Env example:

export KUBEMARK_NUM_NODES=15000 NUM_NODES=310 SCALEOUT_TP_COUNT=2 SCALEOUT_RP_COUNT=2 RUN_PREFIX=poc430-041621-2x2x15k

export MASTER_DISK_SIZE=1000GB MASTER_ROOT_DISK_SIZE=1000GB KUBE_GCE_ZONE=us-central1-b MASTER_SIZE=n1-highmem-96 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=1000GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false ETCD_QUOTA_BACKEND_BYTES=8589934592 SHARE_PARTITIONSERVER=false LOGROTATE_FILES_MAX_COUNT=200 LOGROTATE_MAX_SIZE=200M KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2 SCALEOUT_CLUSTER=true

Perf Cmd:

SCALEOUT_TEST_TENANT=arktos RUN_PREFIX=poc430-041621-2x2x15k PERF_LOG_DIR=/home/sonyali/logs/perf-test/gce-15000/arktos/${RUN_PREFIX}/${SCALEOUT_TEST_TENANT} nohup perf-tests/clusterloader2/run-e2e.sh --nodes=15000 --provider=kubemark --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark-proxy --report-dir=${PERF_LOG_DIR} --testconfig=testing/density/config.yaml --testoverrides=./testing/experiments/disable_pvs.yaml > ${PERF_LOG_DIR}/perf-run.log  2>&1  &

SCALEOUT_TEST_TENANT=zeta RUN_PREFIX=poc430-041621-2x2x15k PERF_LOG_DIR=/home/sonyali/logs/perf-test/gce-15000/arktos/${RUN_PREFIX}/${SCALEOUT_TEST_TENANT}  nohup perf-tests/clusterloader2/run-e2e.sh --nodes=15000 --provider=kubemark --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/test/kubemark/resources/kubeconfig.kubemark-proxy --report-dir=${PERF_LOG_DIR} --testconfig=testing/density/config.yaml --testoverrides=./testing/experiments/disable_pvs.yaml > ${PERF_LOG_DIR}/perf-run.log  2>&1  &

11/29/2021 [Scale-out][POC130] 3TP3RP25k Nodes with qps 100/25

Build and commit information: https://github.com/CentaurusInfra/arktos

11756a4be7f (HEAD, upstream/master) Support multiple RPs in Mizar node controller (#1225)
ccd60276544 add csr to rp controller (#1228)
e9d658bd1c3 Daemonset controller supports multi resource partitions (#1224)
bb529334f2a kunsupported cgroup setup causes kubelet to emit a warning rather than exiting (#1220)
c4697f43324 Rename tech doc name - CI bot complains (#1218)
2f1e4277b38 Add a brief introduction to Google Anthos Overall Architecture
6336ea98e7f Move proxy setup logic from dev machines to proxy VM (#1212)
c74b94cc998 (master) fix flannel to v0.14.0 (#1214)
8f427844acf concurrency related code adjustment (#1209)
306c4472071 (tag: v0.9) Bump Arktos to v0.9.0 (#1204)

additional env vars

KUBE_CONTROLLER_EXTRA_ARGS="--kube-api-qps=100 --kube-api-burst=150" 
KUBE_SCHEDULER_EXTRA_ARGS="--kube-api-qps=300 --kube-api-burst=450" 
KUBE_FEATURE_GATES=ExperimentalCriticalPodAnnotation=true,QPSDoubleGCController=true

Additional perf config: change pod latency threshold to 6s; skip deleting saturation pods; skip deleting latency pods

--testoverrides=./testing/density/25k_nodes/override.yaml
--testoverrides=./testing/experiments/deleting_saturation_pods.yaml
--testoverrides=./testing/experiments/deleting_latency_pods.yaml

logs can be found under GCP project workload-controller-manager on sonyadev4: /home/sonyali/logs/perf-test/gce-25k/arktos/rel130-112921-3x3x25k
[tenant: arktos] Test Result-density: Test finished with Status: Fail

E1129 21:29:22.815530   20985 clusterloader.go:219] Test Finished
E1129 21:29:22.815534   20985 clusterloader.go:220]   Test: testing/density/config.yaml
E1129 21:29:22.815539   20985 clusterloader.go:221]   Status: Fail
E1129 21:29:22.815543   20985 clusterloader.go:223]   Errors: [measurement call PodStartupLatency - PodStartupLatency error: pod startup: too high latency 99th percentile: got 9.770964654s expected: 6s]

PodStartupLatency:

"data": {
        "Perc50": 1846.034238,
        "Perc90": 2868.577272,
        "Perc99": 9770.964654
      },
      "unit": "ms",

SaturationPodStartupLatency:

"data": {
        "Perc50": 8247.532652,
        "Perc90": 23718.363873,
        "Perc99": 32777.744936
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 114.4,
  "perc90": 130.4,
  "perc99": 220.6,
  "max": 433.4
}

[tenant: monkey] Test Result-density: Test finished with Status: Fail

E1129 21:28:38.918370   18897 clusterloader.go:220]   Test: testing/density/config.yaml
E1129 21:28:38.918375   18897 clusterloader.go:221]   Status: Fail
E1129 21:28:38.918379   18897 clusterloader.go:223]   Errors: [namespace oyhkg4-testns object latency-deployment-58 creation error: the server is currently unable to handle the request
namespace oyhkg4-testns object latency-deployment-59 creation error: the server is currently unable to handle the request
namespace oyhkg4-testns object latency-deployment-60 creation error: the server is currently unable to handle the request
namespace oyhkg4-testns object latency-deployment-61 creation error: the server is currently unable to handle the request

PodStartupLatency:

"data": {
        "Perc50": 1838.01862,
        "Perc90": 2887.80774,
        "Perc99": 11781.367567
      },
      "unit": "ms",

SaturationPodStartupLatency:

"data": {
        "Perc50": 7441.018805,
        "Perc90": 15135.40746,
        "Perc99": 23663.440455
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 116.2,
  "perc90": 128.8,
  "perc99": 222.8,
  "max": 389.4
}

[tenant: zeta] Test Result-density: Test finished with Status: Fail

E1129 21:28:33.288976   29181 clusterloader.go:219] Test Finished
E1129 21:28:33.288981   29181 clusterloader.go:220]   Test: testing/density/config.yaml
E1129 21:28:33.288985   29181 clusterloader.go:221]   Status: Fail
E1129 21:28:33.288990   29181 clusterloader.go:223]   Errors: [measurement call PodStartupLatency - PodStartupLatency error: pod startup: too high latency 99th percentile: got 9.695355474s expected: 6s]

PodStartupLatency:

"data": {
        "Perc50": 1846.659701,
        "Perc90": 2882.973899,
        "Perc99": 9695.355474
      },
      "unit": "ms",

SaturationPodStartupLatency:

"data": {
        "Perc50": 7939.144063,
        "Perc90": 22108.767104,
        "Perc99": 39638.286394
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 112,
  "perc90": 125.4,
  "perc99": 198.2,
  "max": 499.4
}

12/02/2021 [Scale-out][POC130] 3TP3RP20k Nodes with qps 80/20

Build and commit information: https://github.com/futurewei-cloud/arktos-perftest/tree/poc20220130-perf-1202

f4f9af2a616 (HEAD, arktos-perf/poc20220130-perf-1202) add 20k_nodes override.yaml for density perf test
11756a4be7f (upstream/master) Support multiple RPs in Mizar node controller (#1225)
ccd60276544 add csr to rp controller (#1228)
e9d658bd1c3 Daemonset controller supports multi resource partitions (#1224)
bb529334f2a kunsupported cgroup setup causes kubelet to emit a warning rather than exiting (#1220)
c4697f43324 Rename tech doc name - CI bot complains (#1218)
2f1e4277b38 (master) Add a brief introduction to Google Anthos Overall Architecture
6336ea98e7f Move proxy setup logic from dev machines to proxy VM (#1212)
c74b94cc998 (sindica/master) fix flannel to v0.14.0 (#1214)
8f427844acf concurrency related code adjustment (#1209)
306c4472071 (tag: v0.9, arktos-perf/master) Bump Arktos to v0.9.0 (#1204)

additional env vars

KUBE_CONTROLLER_EXTRA_ARGS="--kube-api-qps=80 --kube-api-burst=120" 
KUBE_SCHEDULER_EXTRA_ARGS="--kube-api-qps=300 --kube-api-burst=450" 
KUBE_FEATURE_GATES=ExperimentalCriticalPodAnnotation=true,QPSDoubleGCController=true

Additional perf config: change pod latency threshold to 6s; skip deleting saturation pods; skip deleting latency pods

--testoverrides=./testing/density/20k_nodes/override.yaml
--testoverrides=./testing/experiments/deleting_saturation_pods.yaml
--testoverrides=./testing/experiments/deleting_latency_pods.yaml

logs can be found under GCP project workload-controller-manager on sonyadev4: /home/sonyali/logs/perf-test/gce-20k/arktos/rel130-120221-3x3x20k
[tenant: arktos] Test Result-density: Test finished with Status: Fail

E1202 22:03:20.015506    2712 clusterloader.go:220]   Test: testing/density/config.yaml
E1202 22:03:20.015510    2712 clusterloader.go:221]   Status: Fail
E1202 22:03:20.015514    2712 clusterloader.go:223]   Errors: [measurement call PodStartupLatency - PodStartupLatency error: pod startup: too high latency 99th percentile: got 6.75555489s expected: 6s]

PodStartupLatency:

"data": {
        "Perc50": 1808.89176,
        "Perc90": 2716.544209,
        "Perc99": 6755.55489
      },

SaturationPodStartupLatency:

"data": {
        "Perc50": 2000.715777,
        "Perc90": 5337.728881,
        "Perc99": 8900.365149
      },

SchedulingThroughput:

{
  "perc50": 84.6,
  "perc90": 119.4,
  "perc99": 141.2,
  "max": 169.6

[tenant: monkey] Test Result-density: Test finished with Status: Fail

E1202 22:03:05.119264   24248 clusterloader.go:219] Test Finished
E1202 22:03:05.119268   24248 clusterloader.go:220]   Test: testing/density/config.yaml
E1202 22:03:05.119273   24248 clusterloader.go:221]   Status: Fail
E1202 22:03:05.119286   24248 clusterloader.go:223]   Errors: [measurement call PodStartupLatency - PodStartupLatency error: pod startup: too high latency 99th percentile: got 6.367232281s expected: 6s]

PodStartupLatency:

"data": {
        "Perc50": 1812.364377,
        "Perc90": 2720.207293,
        "Perc99": 6367.232281
      },

SaturationPodStartupLatency:

"data": {
        "Perc50": 1953.705991,
        "Perc90": 4494.297597,
        "Perc99": 7879.681238
      },

SchedulingThroughput:

{
  "perc50": 83.8,
  "perc90": 107.2,
  "perc99": 141.8,
  "max": 175.6
}

[tenant: zeta] Test Result-density: Test finished with Status: Fail

E1202 22:03:00.116689   26378 clusterloader.go:219] Test Finished
E1202 22:03:00.116694   26378 clusterloader.go:220]   Test: testing/density/config.yaml
E1202 22:03:00.116698   26378 clusterloader.go:221]   Status: Fail
E1202 22:03:00.116702   26378 clusterloader.go:223]   Errors: [measurement call PodStartupLatency - PodStartupLatency error: pod startup: too high latency 99th percentile: got 6.644877432s expected: 6s]

PodStartupLatency:

"data": {
        "Perc50": 1804.040981,
        "Perc90": 2702.087421,
        "Perc99": 6644.877432
      },

SaturationPodStartupLatency:

"data": {
        "Perc50": 1980.46348,
        "Perc90": 4878.498122,
        "Perc99": 8302.981652
      },

SchedulingThroughput:

{
  "perc50": 84,
  "perc90": 113.8,
  "perc99": 139,
  "max": 153.4
}

03/11/2022 [Scale-out][REL130] 2TP2RP500 Nodes

Build and commit information: https://github.com/CentaurusInfra/arktos

d7323cd9376 (HEAD, origin/master-scaleout-serviceiprange) Different service-cluster-ip-range for different TP
b97a1ff2b0e (origin/master-kubeupsupportvpcrange, master-kubeupsupportvpcrange) kube-up support vpc range (#1397)
1e34a15b2ee Distinct VPC range, passing VPC start/end from cmd arg for scale out (#1398)
c6b37c3a605 [Arktos] The scripts for scale-up + workers environment on AWS Ubuntu1804&Ubuntu2004 and scale-out 2x2 + workers environment on AWS Ubuntu 2004  (#1382)
b509faba333 static pods on different nodes are assigned unique uid (#1393)
95c0f4e9a8c Design doc for Mizar-Arktos Integration (#1347)
5d8567ddaa7 Kubeup scaleout mizar support (#1385)
8a545a48b57 scale-up mizar support (#1377)
c3e1ece1df9 Mizar VPC support for service, add TP master to mizar droplets (#1371)
29a4be4e249 update golang version in setup-dev-env.md (#1376)
...

logs can be found under GCP project workload-controller-manager on sonyadev4: /home/sonyali/logs/perf-test/gce-500/arktos/rel130-031122-2x2x500
[tenant: arktos] Test Result-density: Test finished with Status: Success

PodStartupLatency:

"data": {
        "Perc50": 1705.550843,
        "Perc90": 2445.832458,
        "Perc99": 2916.089723
      },
      "unit": "ms",

SaturationPodStartupLatency:

 "data": {
        "Perc50": 1768.615967,
        "Perc90": 2466.832409,
        "Perc99": 2923.741039
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 20,
  "perc90": 20,
  "perc99": 20.2,
  "max": 20.2
}

[tenant: zeta] Test Result-density: Test finished with Status: Success

PodStartupLatency:

"data": {
        "Perc50": 1785.975631,
        "Perc90": 2514.09808,
        "Perc99": 2872.509759
      },
      "unit": "ms",

SaturationPodStartupLatency:

"data": {
        "Perc50": 1814.88997,
        "Perc90": 2515.723203,
        "Perc99": 2985.930222
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 20,
  "perc90": 20,
  "perc99": 20.2,
  "max": 20.2
}

03/11/2022 [Scale-up][REL130] 500 Nodes

Build and commit information: https://github.com/CentaurusInfra/arktos

d7323cd9376 (HEAD, origin/master-scaleout-serviceiprange) Different service-cluster-ip-range for different TP
b97a1ff2b0e (origin/master-kubeupsupportvpcrange, master-kubeupsupportvpcrange) kube-up support vpc range (#1397)
1e34a15b2ee Distinct VPC range, passing VPC start/end from cmd arg for scale out (#1398)
c6b37c3a605 [Arktos] The scripts for scale-up + workers environment on AWS Ubuntu1804&Ubuntu2004 and scale-out 2x2 + workers environment on AWS Ubuntu 2004  (#1382)
b509faba333 static pods on different nodes are assigned unique uid (#1393)
95c0f4e9a8c Design doc for Mizar-Arktos Integration (#1347)
5d8567ddaa7 Kubeup scaleout mizar support (#1385)
8a545a48b57 scale-up mizar support (#1377)
c3e1ece1df9 Mizar VPC support for service, add TP master to mizar droplets (#1371)
29a4be4e249 update golang version in setup-dev-env.md (#1376)
...

logs can be found under GCP project workload-controller-manager on sonyadev4: /home/sonyali/logs/perf-test/gce-500/arktos/rel130-031121-up
Test Result-density: Test finished with Status: Success

PodStartupLatency:

"data": {
        "Perc50": 1746.612489,
        "Perc90": 2453.854045,
        "Perc99": 2887.546415
      },
      "unit": "ms",

SaturationPodStartupLatency:

"data": {
        "Perc50": 1787.943107,
        "Perc90": 2493.532866,
        "Perc99": 2960.929454
      },
      "unit": "ms",

SchedulingThroughput:

{
  "perc50": 20,
  "perc90": 20,
  "perc99": 20.2,
  "max": 20.2
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RELEASE 2022 0130 Perf tests Result

[Scale-up]Common Test Setup (specify specific test change below )

[Scale-out]Common Test Setup (specify specific test change below )

11/29/2021 [Scale-out][POC130] 3TP3RP25k Nodes with qps 100/25

12/02/2021 [Scale-out][POC130] 3TP3RP20k Nodes with qps 80/20

03/11/2022 [Scale-out][REL130] 2TP2RP500 Nodes

03/11/2022 [Scale-up][REL130] 500 Nodes

Clone this wiki locally