Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor UI performance with lots of workflows viewing all. Random sort order when viewing paginated. #4546

Closed
snowzach opened this issue Nov 17, 2020 · 9 comments

Comments

@snowzach
Copy link
Contributor

Summary

When viewing 100's of workflows paginated, the sort order doesn't seem to make sense. When viewing all workflows, the load time is very high.

Diagnostics

What Kubernetes provider are you using? GKE

What version of Argo Workflows are you running? 2.11.7

I am using a Postgres database to archive workflows as well. When I view workflows in the list, I can select all in which case it takes around 13 seconds to load the listing and is appropriately sorted. When using paginated sorting, it loads quickly but it seems quite random which workflows are on each page.

I see the message Workflows cannot be globally sorted when paginated

Is this "It is what it is.." type issue? Since I am archiving all workflows, is is possible to pull the listing from the database instead in the proper order?


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@simster7
Copy link
Member

When viewing 100's of workflows paginated, the sort order doesn't seem to make sense.

This is known. We use K8s' pagination and simply propagate its result, and K8s pagination does not have "global sorting" between pages. See #2926.

Is this "It is what it is.." type issue?

For now, yes.

Since I am archiving all workflows, is is possible to pull the listing from the database instead in the proper order?

That would actually be the intended use case. Ideally the live workflows view is meant for recent workflows. Those workflows would then have short TTLs after completion and if their information is needed afterwards, it would be available in the workflow archive.

Is there a reason why you need to persist a large number of workflows after they are completed?

@snowzach
Copy link
Contributor Author

snowzach commented Nov 17, 2020

Okay... This makes sense... I guess I didn't follow that I need to make sure my jobs get archived after a certain amount of time to remove them from the list. I am still learning.

So, for others... I added this to my configmap so there is a TTL automatically applied to all of my workflows and that list now has only workflows from the past hour. It's safe to use the all option which sorts correctly (assuming you have a sane number of workflows from the past hour)

   workflowDefaults:
    spec:
      ttlStrategy:
        secondsAfterCompletion: 3600

And then I can use the Workflow Archive section which loads data from my postgres database quite quickly and in the right order.

Thanks!

@snowzach
Copy link
Contributor Author

@simster7 do you think it would be a good idea to set some sort of sane workflow TTL default (24 hours)? I am starting to wonder if poor performance of my cluster/kubectl CLI was due to so many workflows sitting in etcd. It was not at all clear to me that I should be clearing out old workflows manually. I probably missed it in the docs somewhere. Just food for thought.

@simster7
Copy link
Member

simster7 commented Nov 17, 2020

We actually have nudges in the UI that display this information (#3089)

image

I'm thinking that we should also add nudges on the CLI. I'll work on that soon.

@simster7
Copy link
Member

do you think it would be a good idea to set some sort of sane workflow TTL default (24 hours)

It might be useful, but I don't think we'd want to add code that deletes K8s resources without the user taking an explicit action to enable it. Especially since the workflow archive may not be enabled so information could be lost

@snowzach
Copy link
Contributor Author

@simster7 I saw the nudge in the UI about the number of workflows and I checked the link to the documentation. I guess I didn't read it well enough... Up until this point I didn't follow that there was the "active workflows" vs the "workflow archive" being two different holding tanks that you need to somewhat manually manage the transition between. My suggestion would still be a sane default for archiving.. The confusion between "where did my workflow go?" vs "my cluster is running like crap and it's really slow to load" Although I guess if you don't have an archive then it becomes another issue. I dunno...

@snowzach
Copy link
Contributor Author

@simster7 how about this #4549

@or-shachar
Copy link
Contributor

Do we have a docs section of known issues and limitations?

Maybe we should add to the operator manual a section for it and include this issue as I think many users would hit that problem.

WDYT?

@agilgur5
Copy link
Member

We use K8s' pagination and simply propagate its result, and K8s pagination does not have "global sorting" between pages. See #2926.

Noting here that server-side sorting should be possible after #12736 (running an in-memory SQLite DB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants