Stress Test our API #1737

jpellizzari · 2022-03-16T15:45:31Z

We have a "demo" environment in GCP. I think the plan is to have more?

This presents an opportunity to stress test various features of our api, for example:

multi-cluster: how many clusters can we handle with our current caching?
namespaces: how does our code react in a system with 2k+ namespaces?
pagination: does our solution make things better?
users + rbac: how many user+namespace+object combinations can we currently support without slowing?
etc

Steps:

Gather a couple of people on the team to come up with a prioritized list of things we want to see tested
Work with whoever owns that environment to set up some scenarios which test those points, as well as a place to record results
Run the scenarios (in priority order, if we can't do all at once)
Collate the results
Create new tickets/spikes/investigation in the case that we need to change any implementation

This ticket can be seen through by multiple people as it runs; if you start it with step one you don't have to finish it.

Callisto13 · 2022-04-19T15:29:09Z

Set up a call with Sam/Robin and some of us to discuss what we want to test here

JamWils · 2022-05-03T17:08:31Z

I've used this tool in the past and it was really helpful https://www.artillery.io/

jpellizzari · 2022-05-03T18:54:41Z

I've used this tool in the past and it was really helpful https://www.artillery.io/

I think we would need almost the opposite of this: instead of thousands/millions of requests as a client, we need thousands of clusters+namespaces in the "database" and only a couple of client requests per second.

Or at least that is the scaling dimension that most worries me.

SamLR · 2022-05-04T10:01:49Z

I've used this tool in the past and it was really helpful https://www.artillery.io/

I think we would need almost the opposite of this: instead of thousands/millions of requests as a client, we need thousands of clusters+namespaces in the "database" and only a couple of client requests per second.

Or at least that is the scaling dimension that most worries me.

From a resilience PoV some questions I'd be interested in investigating are:

when does the UI/app become unusable? (e.g. if I'm trying to show 10,000 helmrelease resources does it take minutes to respond/is the page just too big)
Does use of the app start blocking interactions with kube-api? Can I DOS kube api by querying 10,000 resources and then force timeouts in flux when trying to deploy new resources?
Can I trigger resource exhaustion in a node running the app if the cluster can't scale any more?

Most of the high risk scenarios I see involving the app centre around it blocking actions against kube-api during an incident or becoming a hindrance/unusable with a large number of resources to display so it feels some tooling to easily generate large numbers of resources that can be inspected via it would be useful.

this could probably be some sort of bash script which spits out large artificial flux configurations or something fancier with some basic templating/config options.

Callisto13 · 2022-05-04T14:41:59Z

@joshri would also like to know that the UI (graph) still looks pretty with a ton of crap in there

SamLR · 2022-05-04T15:49:07Z

I had a chat with @JamWils about this and I'm going to look at options for building flux bucket-sources that can be set up for a couple of basic scenarios and we can go from there.

jpellizzari added the team/mauvelous label Mar 16, 2022

Callisto13 changed the title ~~Stress Test Placeholder~~ Stress Test our API Apr 26, 2022

lasomethingsomething removed the team/mauvelous label Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stress Test our API #1737

Stress Test our API #1737

jpellizzari commented Mar 16, 2022 •

edited by Callisto13

Loading

Callisto13 commented Apr 19, 2022

JamWils commented May 3, 2022

jpellizzari commented May 3, 2022

SamLR commented May 4, 2022

Callisto13 commented May 4, 2022

SamLR commented May 4, 2022

Stress Test our API #1737

Stress Test our API #1737

Comments

jpellizzari commented Mar 16, 2022 • edited by Callisto13 Loading

Callisto13 commented Apr 19, 2022

JamWils commented May 3, 2022

jpellizzari commented May 3, 2022

SamLR commented May 4, 2022

Callisto13 commented May 4, 2022

SamLR commented May 4, 2022

jpellizzari commented Mar 16, 2022 •

edited by Callisto13

Loading