Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task monitor daemon process may limit scalabilty #194

Open
uniqueg opened this issue Jul 21, 2020 · 0 comments
Open

Task monitor daemon process may limit scalabilty #194

uniqueg opened this issue Jul 21, 2020 · 0 comments
Labels
priority: medium Medium priority type: maintenance Related to general repository maintenance workload: weeks Likely takes weeks to resolve

Comments

@uniqueg
Copy link
Member

uniqueg commented Jul 21, 2020

Is your feature request related to a problem? Please describe.

Currently updating the run status in the database involves sending a Celery signal that is picked up by a single task monitor daemon process that is spawned by the main application. As status updates may be numerous if many workflow runs are managed in parallel and status updates may furthermore contain long log messages, this architecture may impose a serious bottleneck for scaling up run throughput.

Describe the solution you'd like

To improve scalability, status updates could be handled by worker processes instead. A status update could be posted to the broker queue and picked up by a worker rather than the task monitor in order to update the database. To ensure that ongoing workflow runs do not block status updates (effectively causing the service to be stuck indefinitely), a dedicated worker pool of at least size would need to be set aside for this purpose.

Describe alternatives you've considered

As an alternative to setting aside a dedicated worker pool for status updates, status updates could also be handled directly by the worker processes that are already handling the workflow runs.

Additional context

It is important that the chosen solution will be conceptually compatible with a future callback mechanism for status updates (see #57, ga4gh/task-execution-schemas#121, ga4gh/workflow-execution-service-schemas#133 & ga4gh/cloud-interop-testing#98 (comment)).

@uniqueg uniqueg added priority: medium Medium priority type: maintenance Related to general repository maintenance workload: weeks Likely takes weeks to resolve labels Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: medium Medium priority type: maintenance Related to general repository maintenance workload: weeks Likely takes weeks to resolve
Projects
None yet
Development

No branches or pull requests

1 participant