Skip to content

Latest commit

 

History

History
616 lines (603 loc) · 59.3 KB

apis-opa-envoy.md

File metadata and controls

616 lines (603 loc) · 59.3 KB

API Benchmarking with OPA and Envoy Sidecar

Jmeter Cluster

For benchmarking the APIs, a Jmeter cluster (1 master + 4 slaves in cluster) was setup to perform API testing and verifying improvements in parallel.

APIs invoked in this benchmarking

API Name API path Description
Update Batch /api/course/v1/batch/update PATCH API used to update batch details
List Course Enrollments /api/course/v1/user/enrollment/list GET API used to list user's course enrollments
Read Content State /api/course/v1/content/state/read POST API used to read user's course progress details
Update Content State /api/course/v1/content/state/update PATCH API used to update user's course progress details
Course Enrollment /api/course/v1/enroll POST API used to enroll to a course
Course Un-Enrollment /api/course/v1/unenrol POST API used to un-enroll from a course
Assign Role /api/user/v1/role/assign POST API used to assign roles to a user
Assign Role V2 /api/user/v2/role/assign POST API used to assign roles to a user (v2 version)
Accept Terms and Conditions /api/user/v1/tnc/accept POST API used to accept the terms and conditions
Update User /api/v1/user/update PATCH API used to update a user's details
Submit Data Exhaust Request /api/dataset/v1/request/submit POST API used to submit request to obtain various type of reports
Get Data Exhaust Request /api/dataset/v1/request/read GET API used to read the request submitted for report generation
List Data Exhaust Request /api/dataset/v1/request/list GET API used to list the requests submitted for report generation
Private User Read /private/user/v1/read Private GET API used to read user details
Private User Lookup /private/user/v1/lookup Private POST API used to search user

Run and Infra details

  • Release 4.5.0 code
  • Each run duration is approximately 20 - 30 minutes
  • Each run was done twice
    • With OPA and Envoy
    • Without OPA and Envoy
  • Some runs were done multiple times to verify throughput, latencies and errors are consistent across multiple runs
  • OPA and Envoy were added as sidecars for the following services on Kubernetes
    • Analytics
    • CertRegistry
    • Content
    • KnowledgeMW
    • Learner
    • LMS
  • Private APIs are invoked directly by calling the service endpoint since those apis are not available on API gateway (Kong API metrics are also not availble for these APIs)
  • Only for certain high throughput APIs, the overall infra CPU usage is captured

Benchmark details

API Name and URL
Scenario
API TPS Total Latency (ms) Upstream Latency (ms) Jmeter Overall CPU Usage in Cores
max avg max avg max avg TPS Avg Latency (ms) max avg
updateBatch
/course/v1/batch/update
Without OPA 72 64 3500 2880 3500 2880 66.6 2907
NA
With OPA 73 62 3490 2780 3490 2780 67.2 2824
listCourseEnrollments
/course/v1/user/enrollment/list
Without OPA 417 381 517.54 491.44 516.52 490.41 399.5 497
NA
With OPA 411 381 510.37 492.1 509.37 491.08 398.3 498
readContentState
/course/v1/content/state/read
Without OPA 3563 3296 64.53 54.8 64.16 54.42 3496.4 56 49.92 39.77
With OPA 3624 3365 59.43 53.56 59.02 53.18 3558 55 40.37 29.29
updateContentState
/course/v1/content/state/update
Without OPA 1965 1826 102.07 97.57 101.67 97.07 1944.2 101 40.04 31.16
With OPA 1979 1826 101.1 97.93 100.72 97.54 1960.7 100 41.98 32.00
courseUnEnrolment
/course/v1/unenrol
Without OPA 69 53 3910 3470 3910 3470 55.7 3498
NA
With OPA 54 52 3810 3660 3810 3660 53.3 3678
courseEnrolment
/course/v1/enroll
Without OPA 2108 1901 97.34 89.06 96.98 88.21 2079.9 94 23.75 15.24
With OPA 2059 1846 107.4 97.77 106.97 97.4 1993.6 98 28.12 18.08
assignRole
/user/v1/role/assign
Without OPA 1910 1058 281.78 196.88 281.33 196.44 1088.3 180 22.57 15.34
With OPA 1644 1039 310.03 188.36 309.37 187.94 1078.3 181 25.99 18.92
assignRoleV2
/user/v2/role/assign
Without OPA 2096 1124 247.13 176.58 246.68 176.15 1181.8 166 34.74 22.31
With OPA 1752 1147 240.56 173.06 240.1 172.63 1189.4 165 36.46 24.32
acceptTermsAndCondition
/user/v1/tnc/accept
Without OPA 793 709 191.94 28.93 191.27 28.54 741.03 27.88
NA
With OPA 761 687 266.98 36.04 264.16 35.61 726.48 33.17
updateUser
/v1/user/update
Without OPA 1300 950 239.14 188.21 238.69 187.78 1005.6 190 40.05 24.43
With OPA 1330 936 242.11 190.72 241.68 190.29 991 190 42.76 25.36
Private User Read
/private/user/v1/read
Without OPA
NA
1237.5 153 27.21 21.14
With OPA 1320.1 141 29.27 22.67
Private User Lookup
/private/user/v1/lookup
Without OPA
NA
2396.7 78 23.28 18.99
With OPA 2767.2 68 28.58 22.59
submitDataExhaustRequest
/dataset/v1/request/submit
Without OPA 236 206 3350 939.22 3340 938.4 217 889
NA
With OPA 288 260 993.19 699.74 990.61 699.08 274.8 708
getDataExhaustRequest
/dataset/v1/request/read
Without OPA 5545 5200 35.66 34.52 35.31 34.18 5490.9 35
NA
With OPA 5334 4869 106.55 37.6 106.13 37.25 5230.6 37
listDataExhaustRequest
/dataset/v1/request/list
Without OPA 332 296 732.66 613.22 731.66 612.66 312 623
NA
With OPA 332 307 627.9 597.27 627.35 596.6 325.2 605

Conclusion

  • After introducing OPA and Envoy we are getting similar average throughput and average latency for each API with a slight increase in the infra resources
  • Some APIs performed better after introducing OPA and Envoy (we ran multiple runs with / without OPA and Envoy to validate this and still got the same results)
  • In general, an increase of 5ms - 10ms in average latency is expected due to OPA and Envoy sidecars
  • The overall infra CPU usage (kubernetes cluster, databases and other VMs involved in the run) also increased due to additional resources used by OPA and Envoy or due to better throughput
  • In some cases the overall CPU usage decreased after introducing OPA and Envoy
  • In general, an increase of 10% - 20% in average overall CPU usage is expected due to OPA and Envoy sidecars
  • The increase in overall memory usage is negligible compared to regular workloads, hence we have not taken it into consideration

Long Run Soak Test

  • We ran a 60 hour benchmark on the APIs that are part of the soak test (top used APIs) and the results were as we expected
  • Overall TPS
    • 22850 (with opa and envoy)
    • 16800 (without opa and envoy)
  • There is an increase in API latencies and infra resource utilization due to -
    • Higher throughput
    • OPA and Envoy side cars
  • All times are in milliseconds (avg, min, max, median etc)
  • KO indicates the number of errors (NOT OK)
  • The Received and Sent column indicate network bandwidth in KB/s

With OPA and Envoy

Jmeter Cluster 1
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 2561300139 38439 0.00% 49.45 0 30501 4 5 11 18 12714.29 13768.93 24417.97
dialAssemble 35148980 376 0.00% 66.45 0 15578 17 41 57 94.99 174.56 316.45 250.53
getCourseHierarchy 45751717 4744 0.01% 3.29 0 29808 1 2 3 11 227.22 1583.19 245.32
readContent 78248056 7413 0.01% 3.33 0 30501 1 1 2 10 388.6 837.47 367.01
readForm 28499660 1269 0.00% 432.08 0 23856 16 27 43 758.99 141.54 242.04 139.28
refreshToken-to-accessToken 209631562 494 0.00% 111.82 0 15552 12 22 39 61 1041.08 1901.28 1335.91
searchContent 77921394 22020 0.03% 6.29 0 15434 1 7 15 41 386.98 2141.36 622.15
sendTelemetry 2086098770 2123 0.00% 42.03 0 15775 4 5 6 15 10355.4 6750.33 21459.14
Jmeter Cluster 2
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 1067045559 5323 0.00% 247.62 0 17766 18 24 29.95 49 5299.18 10336.62 12630.64
getUserProfile 37435237 329 0.00% 270.49 0 16592 26 88 96 115 185.91 477.93 445.46
getUserProfileV2 37436204 376 0.00% 270.45 0 16288 25 87 96 114 185.92 477.95 445.48
getUserProfileV3 200735508 589 0.00% 263.71 1 15746 19 25 31 54 996.9 2562.74 2388.45
getUserProfileV4 211993666 607 0.00% 259.36 1 16114 18 24 30 52 1052.81 2463.9 2522.39
getUserProfileV5 210203687 609 0.00% 261.58 1 16368 19 24 31 52.99 1043.92 2532.11 2501.08
readUserConsent 135193272 900 0.00% 296.59 0 16101 10 16 29 57 671.4 678.83 1768.09
searchUser 40856041 436 0.00% 147.33 0 15868 16 74 83 97 202.9 336.46 377.67
updateUser 38110078 466 0.00% 244.46 0 16113 24 89 98 126 189.26 153.16 407.29
updateUserConsent 35428348 554 0.00% 353.86 0 17766 18 80 90 106 175.95 153.03 415.92
userFeed 119653518 457 0.00% 109.5 0 15785 12 37 52 72 594.23 500.54 1358.84
Jmeter Cluster 3
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 753688224 4330 0.00% 149.66 0 25998 15 69 92 113 3742.99 4572.6 7917.85
getBatch 36676527 312 0.00% 1.19 0 15362 1 1 2 11 182.15 204.58 352.37
listCourseEnrollments 37004452 2130 0.01% 1473.99 1 25998 62 108 126 168 183.77 1304.85 382.89
readContentState 331970043 751 0.00% 80.34 0 15664 7 10 12 17 1648.64 1560.65 3512.76
searchCourseBatches 45157049 385 0.00% 19.16 0 15499 8 13 17 30 224.26 239.92 180.02
updateContentState 302880153 752 0.00% 101.27 0 15551 15 20 23 36 1504.17 1262.62 3489.83
Jmeter Cluster 4
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 178827023 520 0.00% 10.02 0 15516 5 8 12 18 888.1 814.09 763.35
deviceProfile 88530862 170 0.00% 10.64 0 15503 5 8 11 17 439.67 353.89 269.33
deviceRegister 44992301 164 0.00% 11.51 0 15516 6 9 13 19 223.44 188.59 321.7
registerMobileDevicev2 45303860 186 0.00% 7.32 0 15508 4 6 8 15 224.99 271.61 172.32
Jmeter Cluster 5
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 41307270 1589 0.00% 810.07 0 122179 52 157 176 211 205.36 459.08 262.89
/auth/realms/sunbird/protocol/openid-connect/auth 5949387 265 0.00% 433.01 1 85307 10 13 15 25 29.58 114.65 20.81
/home 5949421 47 0.00% 99.96 1 14929 52 58 61 76.99 29.58 40.63 11.44
AuthCallBackRedirect 5949263 227 0.00% 59.61 0 13442 9 13 15 40 29.58 56.49 21.07
keycloakloginaction 5949312 532 0.01% 2685.96 1 122179 200 279 310 688.98 29.58 124.56 105.15
keycloakloginaction-0 5863756 0 0.00% 1451.27 2 74360 99 147 158 202 29.15 49.04 51.75
keycloakloginaction-1 5863756 505 0.01% 855.23 1 84624 72 135 156 248 29.15 18.24 29.77
keycloakloginaction-2 5782360 13 0.00% 74.68 1 60002 10 25 44 59 28.75 55.49 22.9
keycloakloginaction-3 15 0 0.00% 3007.73 16 32414 34 20289.2 32414 32414 0 0 0

Without OPA and Envoy

Jmeter Cluster 1
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 101131119 1597 0.00% 45.59 0 30295 5 8 14 46 9389.78 10198.96 17959.81
dialAssemble 1376044 91 0.01% 68.82 0 30295 19 45 62 141 128 227.56 184.69
getCourseHierarchy 1794291 84 0.00% 2.96 0 7036 1 2 3 12 166.9 1157.61 181.49
readContent 3046244 77 0.00% 8.29 0 7303 1 2 8 21 283.34 600.14 269.8
readForm 1167798 99 0.01% 348.76 0 7239 17 30 41 699.99 108.63 185.37 107.74
refreshToken-to-accessToken 10346083 132 0.00% 58.11 0 5882 6 8 10 16 962.26 1631.68 1242.28
searchContent 3012650 948 0.03% 16.21 0 6555 1 16 25 58 280.21 1539.69 452.77
sendTelemetry 80388009 166 0.00% 42.65 0 7376 5 8 13 48 7463.95 4866.41 15525.56
Jmeter Cluster 2
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 49759212 963 0.00% 191.38 0 7376 19 80 87 96 4627.65 8967.63 11031.99
getUserProfile 1502452 82 0.01% 236.54 0 7331 27 184 219 412 139.76 349.34 333.27
getUserProfileV2 1502206 111 0.01% 236.78 0 7304 27 185 222.95 406 139.74 349.3 333.21
getUserProfileV3 9321593 101 0.00% 220.9 1 7320 21 94 105 182 866.94 2167.04 2067.4
getUserProfileV4 9930027 97 0.00% 216.96 1 7146 20 95 103 179 923.52 2096.21 2202.35
getUserProfileV5 9897625 109 0.00% 217.67 1 7376 20 95 103 179 920.52 2134.05 2195.18
readUserConsent 7768212 88 0.00% 167.1 0 7260 10 83 95 185 722.48 705.72 1894.38
searchUser 1679566 81 0.00% 84.99 0 3050 14 106 183 292 156.23 239.81 289
updateUser 1553572 95 0.01% 189.27 0 3075 25 191 260 387 144.51 111.89 309.32
updateUserConsent 1521447 92 0.01% 218.63 0 3843 17 172 207 390 141.52 118.12 332.91
userFeed 5082512 107 0.00% 74 0 7199 11 86 98 182 472.71 696.62 1075.57
Jmeter Cluster 3
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 25463353 473 0.00% 188.71 0 16240 14 78 86 121.99 2368.15 2287.59 4936.76
getBatch 1438082 86 0.01% 1.25 0 3681 1 1 2 12 133.77 145.98 257.26
listCourseEnrollments 2087628 94 0.00% 1024.16 1 16240 50 119 146 217 194.15 355.37 403.07
readContentState 10567790 99 0.00% 118.8 0 3451 5 9 12 20 982.86 896.73 2083.11
searchCourseBatches 1779958 93 0.01% 12.8 0 3061 6 16 23 42 165.57 171.7 134.19
updateContentState 9589895 101 0.00% 144.64 0 3190 13 20 24 41 891.9 717.91 2059.3
Jmeter Cluster 4
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 2514801 57284 2.28% 752.85 0 96574 6 62 90.95 1597.72 234.18 225.96 194.95
deviceProfile 560376 29939 5.34% 1629.25 0 96574 6 88 101 9426.61 52.18 39.36 33.44
deviceRegister 270504 27300 10.09% 3369.64 0 70060 17 210 588 31560.99 25.19 19.26 36.98
registerMobileDevicev2 1683921 45 0.00% 40.85 0 13043 5 7 9 16 156.81 167.35 124.54
Jmeter Cluster 5
API #Samples KO Error % Average Min Max Median 90th pct 95th pct 99th pct Transactions/s Received Sent
Total 1931486 32 0.00% 282.49 3 70321 51 152 192 274.99 180.34 403.02 231.37
/auth/realms/sunbird/protocol/openid-connect/auth 275934 2 0.00% 19.1 6 1662 10 14 18 27 25.77 99.88 18.13
/home 275939 3 0.00% 73.54 8 2911 53 61 65 76 25.77 35.39 9.96
AuthCallBackRedirect 275902 1 0.00% 13.83 3 1020 9 14 17 27 25.76 49.72 18.36
keycloakloginaction 275933 15 0.01% 935.54 30 70321 234 406 492 752.99 25.77 109.02 92.47
keycloakloginaction-0 275929 0 0.00% 554.77 73 3146 125 230 261 314 25.77 43.58 45.74
keycloakloginaction-1 275929 9 0.00% 363.18 17 70003 78 227 304.95 535 25.77 15.72 26.21
keycloakloginaction-2 275920 2 0.00% 17.43 4 1089 9 17 27 51 25.76 49.73 20.52