DOC-831 Add tasks and update scaling information (#156)

* Add tasks and update scaling information * refinement * add backticks to processor name * review comments * review comments * review comments * review updates
redpanda-data · Dec 18, 2024 · cdd63e7 · cdd63e7
1 parent 3a3a053
commit cdd63e7
Show file tree

Hide file tree

Showing 4 changed files with 197 additions and 69 deletions.
diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc
@@ -73,8 +73,8 @@
 *** xref:develop:connect/configuration/field_paths.adoc[]
 *** xref:develop:connect/configuration/secret-management.adoc[Manage Secrets]
 *** xref:develop:connect/configuration/processing_pipelines.adoc[]
+*** xref:develop:connect/configuration/resource-management.adoc[Manage Pipeline Resources]
 *** xref:develop:connect/configuration/monitor-connect.adoc[Monitor Data Pipelines]
-*** xref:develop:connect/configuration/scale-pipelines.adoc[Scale Data Pipelines]
 *** xref:develop:connect/configuration/unit_testing.adoc[]
 
 ** xref:develop:connect/components/about.adoc[]

diff --git a/modules/develop/pages/connect/configuration/resource-management.adoc b/modules/develop/pages/connect/configuration/resource-management.adoc
@@ -0,0 +1,193 @@
+= Manage Pipeline Resources on BYOC and Dedicated Clusters
+:description: Learn how to set an initial resource limit for a standard data pipeline (excluding Ollama AI components) and how to manually scale the pipeline’s resources to improve performance.
+:page-aliases: develop:connect/configuration/scale-pipelines.adoc
+
+{description}
+
+== Prerequisites
+
+- A running xref:get-started:cluster-types/byoc/index.adoc[BYOC] or xref:get-started:cluster-types/dedicated/create-dedicated-cloud-cluster.adoc[Dedicated cluster]
+- An estimate of the throughput of your data pipeline. You can get some basic statistics by running your data pipeline locally using the xref:redpanda-connect:components:processors/benchmark.adoc[`benchmark` processor].
+
+=== Understanding tasks
+
+A task is a unit of computation that allocates a specific amount of CPU and memory to a data pipeline to handle message throughput. By default, each pipeline is allocated one task, which includes 0.1 CPU (100 milliCPU or `100m`) and 400 MB (`400M`) of memory, and provides a message throughput of approximately 1 MB/sec. You can allocate up to a maximum of 18 tasks per pipeline.
+
+|===
+| Number of Tasks | CPU | Memory
+
+| 1
+| 0.1 CPU (`100m`)
+| 400 MB (`400M`)
+
+| 2
+| 0.2 CPU (`200m`)
+| 800 MB (`800M`)
+
+| 3
+| 0.3 CPU (`300m`)
+| 1.2 GB (`1200M`)
+
+| 4
+| 0.4 CPU (`400m`)
+| 1.6 GB (`1600M`)
+
+| 5
+| 0.5 CPU (`500m`)
+| 2.0 GB (`2000M`)
+
+| 6
+| 0.6 CPU (`600m`)
+| 2.4 GB (`2400M`)
+
+| 7
+| 0.7 CPU (`700m`)
+| 2.8 GB (`2800M`)
+
+| 8
+| 0.8 CPU (`800m`)
+| 3.2 GB (`3200M`)
+
+| 9
+| 0.9 CPU (`900m`)
+| 3.6 GB (`3600M`)
+
+| 10
+| 1.0 CPU (`1000m`)
+| 4.0 GB (`4000M`)
+
+| 11
+| 1.1 CPU (`1100m`)
+| 4.4 GB (`4400M`)
+
+| 12
+| 1.2 CPU (`1200m`)
+| 4.8 GB (`4800M`)
+
+| 13
+| 1.3 CPU (`1300m`)
+| 5.2 GB (`5200M`)
+
+| 14
+| 1.4 CPU (`1400m`)
+| 5.6 GB (`5600M`)
+
+| 15
+| 1.5 CPU (`1500m`)
+| 6.0 GB (`6000M`)
+
+| 16
+| 1.6 CPU (`1600m`)
+| 6.4 GB (`6400M`)
+
+| 17
+| 1.7 CPU (`1700m`)
+| 6.8 GB (`6800M`)
+
+| 18
+| 1.8 CPU (`1800m`)
+| 7.2 GB (`7200M`)
+
+|===
+
+NOTE: For pipelines with embedded Ollama AI components, one GPU task is automatically allocated to the pipeline, which is equivalent to 30 tasks or 3.0 CPU (`3000m`) and 12 GB of memory (`12000M`).
+
+=== Set an initial resource limit
+
+When you create a data pipeline, you can allocate a fixed amount of compute resources to it using tasks.
+
+[NOTE]
+====
+If your pipeline reaches the CPU limit, it becomes throttled, which reduces the data processing rate. If it reaches the memory limit, the pipeline restarts.
+====
+
+To set an initial resource limit:
+
+. Log in to https://cloud.redpanda.com[Redpanda Cloud].
+. On the **Clusters** page, select the cluster where you want to add a pipeline.
+. Go to the **Connect** page.
+. Select the **Redpanda Connect** tab.
+. Click **Create pipeline**.
+. Enter details for your pipeline, including a short name and description.
+. In the **Tasks** box, leave the default **1** task to experiment with pipelines that create low message volumes. For higher throughputs, you can allocate up to a maximum of 18 tasks.
+. Add your pipeline configuration and click **Create** to run it.
+
+=== Scale resources
+
+View the compute resources allocated to a data pipeline, and manually scale those resources to improve performance or decrease resource consumption.
+
+To view resources already allocated to a data pipeline:
+
+[tabs]
+=====
+Cloud UI::
++
+--
+. Log in to https://cloud.redpanda.com[Redpanda Cloud^].
+. Go to the cluster where the pipeline is set up.
+. On the **Connect** page, select your pipeline and look at the value for **Resources**.
++
+* CPU resources are displayed first, in milliCPU. For example, `1` task is `100m` or 0.1 CPU. 
+* Memory is displayed next in megabytes. For example, `1` task is `400M` is 400 MB.
+
+--
+Data Plane API::
++
+--
+. xref:manage:api/cloud-api-quickstart.adoc#try-the-cloud-api[Authenticate and get the base URL] for the Data Plane API. 
+. Make a request to xref:api:ROOT:cloud-api.adoc#get-/v1alpha2/redpanda-connect/pipelines[`GET /v1alpha2/redpanda-connect/pipelines`], which lists details of all pipelines on your cluster by ID. 
++
+* Memory (`memory_shares`) is displayed in megabytes. For example, `1` task is `400M` is 400 MB.
+* CPU resources (`cpu_shares`) are displayed milliCPU. For example, `1` task is `100m` or 0.1 CPU.
+
+--
+=====
+
+To scale the resources for a pipeline:
+
+[tabs]
+=====
+Cloud UI::
++
+--
+. Log in to https://cloud.redpanda.com[Redpanda Cloud^].
+. Go to the cluster where the pipeline is set up.
+. On the **Connect** page, select your pipeline and click **Edit**.
+. In the **Tasks** box, update the number of tasks. One task provides a message throughput of approximately 1 MB/sec. For higher throughputs, you can allocate up to a maximum of 18 tasks per pipeline.
+. Click **Update** to apply your changes. The specified resources are available immediately.
+
+--
+Data Plane API::
++
+--
+You can only update CPU resources using the Data Plane API. For every 0.1 CPU that you allocate, Redpanda Cloud automatically reserves 400 MB of memory for the exclusive use of the pipeline.
+
+. xref:manage:api/cloud-api-quickstart.adoc#try-the-cloud-api[Authenticate and get the base URL] for the Data Plane API, if you haven't already.
+. Make a request to xref:api:ROOT:cloud-api.adoc#get-/v1alpha2/redpanda-connect/pipelines/-id-[`GET /v1alpha2/redpanda-connect/pipelines/\{id}`], including the ID of the pipeline you want to update. You'll use the returned values in the next step.
+. Now make a request to xref:api:ROOT:cloud-api.adoc#put-/v1alpha2/redpanda-connect/pipelines/-id-[`PUT /v1alpha2/redpanda-connect/pipelines/\{id}`], to update the pipeline resources:
++
+* Reuse the values returned by your `GET` request to populate the request body. 
+* Replace the `cpu_shares` value with the resources you want to allocate, and enter any valid value for `memory_shares`.
++
+This example allocates 0.2 CPU or 200 milliCPU to a data pipeline. For `cpu_shares`, `0.1` CPU is the minimum allocation.
++
+[,bash,role=“no-placeholders”]
+----
+curl -X PUT "https://<data-plane-api-url>/v1alpha2/redpanda-connect/pipelines/xxx..." \
+ -H 'accept: application/json'\
+ -H 'authorization: Bearer xxx...' \
+ -H "content-type: application/json" \
+ -d '{"config_yaml":"input:\n generate:\n   interval: 1s\n   mapping: |\n     root.id = uuid_v4()\n     root.   user.name = fake(\"name\")\n     root.user.email = fake(\"email\")\n     root.content = fake(\"paragraph\")\n\npipeline:\n processors:\n   - mutation: |\n       root.title = \"PRIVATE AND CONFIDENTIAL\"\n\noutput:\n kafka_franz:\n   seed_brokers:\n     - seed-j888.byoc.prd.cloud.redpanda.com:9092\n   sasl:\n     - mechanism: SCRAM-SHA-256\n       password: password\n       username: connect\n   topic: processed-emails\n   tls:\n     enabled: true\n", \
+    "description":"Email processor", \ 
+    "display_name":"emailprocessor-pipeline", \
+    "resources":{ \
+        "memory_shares":"800M" \
+        "cpu_shares":"200m", \
+        } \
+      }' 
+----
++
+A successful response shows the updated resource allocations with the `cpu_shares` value returned in milliCPU.
+. Make a request to xref:api:ROOT:cloud-api.adoc#get-/v1alpha2/redpanda-connect/pipelines[`GET /v1alpha2/redpanda-connect/pipelines`] to verify your pipeline resource updates.
+--
+=====
diff --git a/modules/develop/pages/connect/configuration/scale-pipelines.adoc b/modules/develop/pages/connect/configuration/scale-pipelines.adoc
diff --git a/modules/develop/pages/connect/connect-quickstart.adoc b/modules/develop/pages/connect/connect-quickstart.adoc
@@ -82,6 +82,7 @@ All Redpanda Connect configurations use a YAML file split into three sections:
 
 . Go to the **Connect** page on your cluster and click **Create pipeline**.
 . In **Pipeline name**, enter **emailprocessor-pipeline** and add a short description. For example, **Transforms email data using a mutation processor**.
+. In the **Tasks** box, leave the default value of **1**. Tasks are used to allocate resources to a pipeline. One task is equivalent to 0.1 CPU and 400 MB of memory, and provides a message throughput of approximately 1 MB/sec.
 . In the **Configuration** box, paste the following configuration.
 
 +
@@ -234,6 +235,6 @@ When you've finished experimenting with your data pipeline, you can delete the p
 * Try one of our xref:cookbooks:index.adoc[Redpanda Connect cookbooks]. 
 * Choose xref:develop:connect/components/catalog.adoc[connectors for your use case].
 * Learn how to xref:develop:connect/configuration/secret-management.adoc[add secrets to your pipeline].
-* Learn how to xref:develop:connect/configuration/monitor-connect.adoc[monitor a data pipeline on a BYOC cluster].
-* Learn how to xref:develop:connect/configuration/scale-pipelines.adoc[manually scale resources for a pipeline on a BYOC cluster].
+* Learn how to xref:develop:connect/configuration/monitor-connect.adoc[monitor a data pipeline on a BYOC or Dedicated cluster].
+* Learn how to xref:develop:connect/configuration/scale-pipelines.adoc[manually scale resources for a pipeline on a BYOC or Dedicated cluster].
 * Learn how to xref:redpanda-connect:guides:getting_started.adoc[configure, test, and run a data pipeline locally].