The dashboard DOS attacks the service during initial load #372

SGudbrandsson · 2023-06-21T15:48:24Z

Description

helm-dashboard seems to want to get all the information for all the services from the initial call.
This causes a DOS on the service itself.
The CPU spikes and I never get results for the resources view.

This seems to be initiated from datadog-rum-v4.js, initialized by the analytics part.
We're using komodorio/helm-dashboard:1.3.2 docker image

This causes the whole system to stop responding until kubernetes has restarted the dashboard.

I'm trying to modify the deployment and disabling tracking, but this is very annoying.. if you want tracking, you should probably modify this to not DOS the service..

Screenshots

Additional information

No response

SGudbrandsson · 2023-06-21T15:58:29Z

Disabling analytics didn't do anything.

Looks like it's https://github.com/komodorio/helm-dashboard/blob/main/pkg/dashboard/static/list-view.js#L14 looping through all deployments, calling https://github.com/komodorio/helm-dashboard/blob/main/pkg/dashboard/static/list-view.js#L111 in a recursion.

undera · 2023-06-22T08:57:37Z

Hi
It has nothing to do with analytics. It's just how the app works by querying health status for each of the releases. There's no recursion, just loop once over a list. It may be longer if your list is long.

I guess you have many releases installed, are you? How many?

It this moment I'm not sure what would be the best solution to address it. Probably we should only query the health status for those releases that are visible on the list. It's something that could be part of V2 effort (#233).

One of the workaround is to narrow down the list of namespaces in scope via --namespace parameter.

Another immediate option to overcome this is Komodor's platform, where you have the same functionality without scalability issues.

SGudbrandsson · 2023-06-22T16:16:17Z

:/

We deploy multiple times a day, so the list is long.

Looking at the code, it's doing a lot of things that can be optimized.

To get the health of a single service, the app is fetching all helm apps and all helm releases and parsing them before processing and checking the health status of that single app.

There are some architectural decisions you can make to improve this such as:

Cache helm releases for 20 seconds (helps with bursts).
Do a request collapsing - if there are multiple requests to the k8s api for the same data, fold them into a request queue and return all the calling functions the same data.
For a health check, only check health. Don't download the whole catalog for every single application.

This is a cool concept that can be improved a bit by doing tiny performance optimizations.

I don't think we'll be able to use this or Komodor's platform (I guess it's running the same thing behind the scenes).
We'll find another solution 😢

undera · 2023-06-23T08:19:33Z

@SGudbrandsson About Komodor platform - no, it does not use the same code and same approaches, thus it does not suffer from this problem.

Thanks for sharing your observations, it will help to improve the product.

undera · 2023-06-23T08:37:08Z

You were absolutely right about fetching all releases to find single, I'm already fixing that in #373

SGudbrandsson · 2023-06-23T09:33:39Z

Wow, you're amazing!
Thanks for checking this out.

I'm also in contact with sales at Komodor.io regarding the platform to see if it fits our use case (we use GKE autopilot, so node based pricing doesn't work)

seboudry · 2023-10-23T15:36:08Z

Hi!

Hope to see a release with these improvements 😉

Running on an on-prem cluster, this DOS the API server et control planes nodes.

With around 200 releases on our cluster there was 2k packets per seconds sent and 5 MB/s of received bandwith, uge!

Can't try this tool for now 😢

seboudry · 2024-11-13T10:43:55Z

hi @undera !

any chance to get a new release? 😉

undera · 2024-11-13T10:57:05Z

hi @undera !

any chance to get a new release? 😉

Hell yeah!! I've opened this Pandora box

undera · 2024-11-13T11:47:27Z

There you go, after 2 retries it has completed.

SGudbrandsson closed this as completed Jun 22, 2023

undera mentioned this issue Jun 23, 2023

Optimize single release getting #373

Merged

4 tasks

undera mentioned this issue Oct 24, 2023

Release list pagination, frontend part #482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The dashboard DOS attacks the service during initial load #372

The dashboard DOS attacks the service during initial load #372

SGudbrandsson commented Jun 21, 2023

SGudbrandsson commented Jun 21, 2023

undera commented Jun 22, 2023

SGudbrandsson commented Jun 22, 2023

undera commented Jun 23, 2023

undera commented Jun 23, 2023

SGudbrandsson commented Jun 23, 2023

seboudry commented Oct 23, 2023

seboudry commented Nov 13, 2024

undera commented Nov 13, 2024

undera commented Nov 13, 2024 •

edited

Loading

The dashboard DOS attacks the service during initial load #372

The dashboard DOS attacks the service during initial load #372

Comments

SGudbrandsson commented Jun 21, 2023

Description

Screenshots

Additional information

SGudbrandsson commented Jun 21, 2023

undera commented Jun 22, 2023

SGudbrandsson commented Jun 22, 2023

undera commented Jun 23, 2023

undera commented Jun 23, 2023

SGudbrandsson commented Jun 23, 2023

seboudry commented Oct 23, 2023

seboudry commented Nov 13, 2024

undera commented Nov 13, 2024

undera commented Nov 13, 2024 • edited Loading

undera commented Nov 13, 2024 •

edited

Loading