diff --git a/teps/0013-limt-pipeline-concurrency.md b/teps/0013-limt-pipeline-concurrency.md new file mode 100644 index 000000000..f1af2193e --- /dev/null +++ b/teps/0013-limt-pipeline-concurrency.md @@ -0,0 +1,282 @@ +--- +title: pipeline-concurrency +authors: + - "@NikeNano" +creation-date: 2020-10-07 +last-updated: 2020-11-15 +status: proposed +--- + +# TEP-0013: Limit Pipeline concurrency + + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Requirements](#requirements) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Risks and Mitigations](#risks-and-mitigations) + - [Performance](#performance) +- [Design Details](#design-details) +- [Test Plan](#test-plan) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Upgrade & Migration Strategy](#upgrade--migration-strategy) +- [References](#references) + + +## Summary + +Enable users to define the concurrency of a Pipeline to limit how many tasks are run simultaneously. + + + + +## Motivation + +Enable users to limit the number of tasks that can run simultaneously in a pipeline, which could help with: + +- Tracking and limiting how much resources a Pipeline is consuming, and thus how much it costs. + + + +### Goals + +- Limit how many tasks can run concurrently in a Pipeline. + + + +### Non-Goals +- Limit the number of concurrent of Pipelines, as described in [pipeline issue #1305](https://github.com/tektoncd/pipeline/issues/1305). + + + +## Requirements + +- Users can specify the maximum number of Tasks that can run concurrently in a Pipeline. + +## Proposal + +. + +We propose to extend the Tekton pipeline ecosystem with an separate service, called `Limit Service`, which will control when `TaskRuns` are allowed to be executed by the controller. While also allowing for users to extend the `Limit Service` according to there needs. Further discussed in [Design Details](#design-details) below. + + +### User Stories + + + +#### Story 1 +User has a Pipeline with 100 independent Tasks but they don't want all 100 tasks to run at once. +#### Story 2 +User wants to limit amount of resources used by a Pipeline at a given time. +### Risks and Mitigations + + +What if a user mistakenly sets the maximum number of concurrent tasks to zero or less? Does this mean no tasks are run until the pipeline times out? To mitigate against this, we will require that the maximum limit of concurrent tasks should be greater than zero and add validation to ensure it is greater than zero (which would throw an error if it set to zero or less). + +### Performance + + +Given that this allows users to limit the number of concurrent `TaskRuns` in a given `PipelineRun`, the execution time of the `PipelineRun` could increase. However, this allows users to limit the resources used and save costs. +## Design Details + + + +We propose to extend the logic of the `PipelineRun` controller to create all `TaskRuns` with `spec.status.Pending`. In order for an external service called `Limit Service` to control when an `TaskRun` is allowed to be considered by the `Task` controller for execution. This requires extending the `Task` controller to only consider `TaskRuns` which don't have `spec.status.Pending`. The `Limit Service` will update `TaskRuns` and remove the `spec.status.Pending` when considered ready for execution. + +The following examples aims to describe the proposed solution: + +1. `PipelineRun` is created +2. Pipelines controller sees `PipelineRun`, starts creating `TaskRuns` <-- each TaskRun is created with `spec.status.Pending` as proposed in (TEP 15)[https://github.com/tektoncd/community/pull/203] +3. Pipelines controller sees the new `TaskRuns`, but they all have `spec.status.Pending`; it doesn't do anything with them +4. `Limit Service` also sees the `TaskRun` with `spec.status.Pending`. +5. When `Limit Service` decided the `TaskRun` can run, it removes `spec.status.Pending` from the TaskRuns(s) +6. Pipelines controller now sees the `TaskRuns` are not longer pending, and it starts executing them + +Separating the logic if a `TaskRun` is allowed to run from the `Task` controller allows for extensibility for adding custom logic to the `Limit Service`. + +As suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800), we can add a field - `MaxParallelTasks` - to `PipelineRunSpec` which is an integer that represents the maximum number of `Tasks` that can run concurrently in the `Pipeline`. + +type PipelineRunSpec struct { + PipelineSpec *PipelineSpec `json:"pipelineSpec,omitempty"` + ... + // MaxParallelTasks holds the maximum count of parallel taskruns + // +optional + MaxParallelTasks int `json:"maxParallelTasks,omitempty"` +} + +The `Limit Service` could run similar to a control loop checking `TaskRuns` and the restrictions of `MaxParallelTasks` for the related `Pipeline`. If the count of running `TaskRuns` is less than `MaxParallelTasks`, a `TaskRun` would be update and `spec.status.Pending` removed. If the count of running `TaskRuns` equals `MaxParallelTasks`, no `TaskRun` would be updated until later when another `TaskRun` is completed. + +`MaxParallelTasks` has to be >= 0 in. If `MaxParallelTasks` is not specified there should be no limit to how many `TaskRun` that can run in parallel and thus `spec.status.Pending` should be removed from all `TaskRuns`. + +In order to not end up with a deadlock the order of the `Tasks` in a `Pipeline` has to be respected and accounted for by the `Limit service`. + +## Test Plan + + +e2e and unit tests +## Drawbacks + +It could affect the performance of the scheduling by increasing the execution time of the `PipelineRuns`. + +## Alternatives +1. Limit the number of concurrent tasks by setting the resource limitations of each task high enough that there is not enough resource to run more than a certain number of tasks concurrently. However, this is not easily configurable and it is complicated because have to compute the relation between resources and tasks. +2. Utilizing a [pod quota per namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-pod-namespace/). However this would limit all resources in the namespace not only the `PipelineRun` of interest which the limitation is put on. +3. Add logic to the `PipelineRun` controller to check how many `TaskRuns` are running in the `PipelineRun`. This would make the controller logic more complex, but has the advantage that the controller would have all the logic combined. However it would allow for less flexibility for users to implement custom logic. + + +## Upgrade & Migration Strategy + + +The `MaxParallelTasks` in `PipelineRunSpec` will be optional and if not set `spec.status.Pending` will be removed from all `TaskRuns` immediately by the `Limit Service`. An alternative is to not set `spec.status.Pending` when `MaxParallelTasks` is not specified. + +## References + + +- Issue: https://github.com/tektoncd/pipeline/issues/2591 +- POC: https://github.com/tektoncd/pipeline/pull/3112 diff --git a/teps/0013-pipeline-conecurrency.md b/teps/0013-pipeline-conecurrency.md new file mode 100644 index 000000000..ffc782cfd --- /dev/null +++ b/teps/0013-pipeline-conecurrency.md @@ -0,0 +1,278 @@ + +# TEP-0013: Limit Pipeline concurrency + + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Requirements](#requirements) +- [Proposal](#proposal) + - [User Stories (optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) + - [User Experience (optional)](#user-experience-optional) + - [Performance (optional)](#performance-optional) +- [Design Details](#design-details) +- [Test Plan](#test-plan) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (optional)](#infrastructure-needed-optional) +- [Upgrade & Migration Strategy (optional)](#upgrade--migration-strategy-optional) +- [References (optional)](#references-optional) + + +## Summary + +Enable users to defined the concurrency of a Pipeline to limit how many tasks are run simultaneously. + + + + +## Motivation + + Allowing for a limitation in the pipeline concurrency could help with the following tasks: + +- Keeping track of how much $ a Pipeline is costing (how many resources it's consuming) and limiting how frequently it can run as a result +- Limit the amount of resources run simultaneously. +- Improve the user experiance when utalizing pipelines for CI systems in order to avoid issues with for example race conditions. This is described more in deph in the following [issue](https://github.com/tektoncd/pipeline/issues/1305). + + + +### Goals + +- Limit the concurrency of task belonging to a Pipeline. + + + +### Non-Goals +- Limit the concurrency of Pipelines + + + +## Requirements + +Acknowledgement and approval by Tekton's governance board. + +## Proposal + +Suggestions to achive a limitation of the concurrency but not limited to are: + + +- Let the PipelineRun controller check how many TaskRuns are running for a specific PipelineRun label before scheduling new TaskRun as part of a PipelineRun. The concucrrency limit could in this case be specified on the pipelinesrun_types. Later when a TaskRun is completed, the PipelineRun will do the reconciliation again, and the creation of a TaskRun will be re-evaluated. Suggested [here](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-647754800) +- Limt the number of concurrent tasks by setting the resource limitations of each task high enough that there is not enough resource to run more than X tasks concurrently. +- Utilizing a [pod quota per namespace](https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/quota-pod-namespace/) +- Set the concurrency on a group task level where each grouptask would get a limit. [More info](https://github.com/tektoncd/pipeline/issues/2591#issuecomment-626778025) + + +### User Stories (optional) + + + +#### Story 1 + +#### Story 2 + +### Notes/Constraints/Caveats (optional) + + + +### Risks and Mitigations + + + +### User Experience (optional) + + + +### Performance (optional) + + + +## Design Details + + + +## Test Plan + + + +## Drawbacks + +Depends on the strategy of implementation. If strategies based upon kubernetes scheduling resources are pursued it will effect all resources in the name space and slow down performance of the cluster in general since resouces are hoared to limit concurrency. If the scheduling is updated for PipelineRuns it could effect the performance of the scheduling. + + +## Alternatives + + + + +## Infrastructure Needed (optional) + + + +## Upgrade & Migration Strategy (optional) + + + +## References (optional) + +