Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dask): Dask integration (#821) #821

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Alputer
Copy link
Member

@Alputer Alputer commented Aug 13, 2024

This pull request contains the ongoing work of dask integration into REANA.

@Alputer Alputer self-assigned this Aug 13, 2024
Copy link

codecov bot commented Aug 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 31.08%. Comparing base (467be7a) to head (c6faf71).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #821   +/-   ##
=======================================
  Coverage   31.08%   31.08%           
=======================================
  Files          26       26           
  Lines        2487     2487           
=======================================
  Hits          773      773           
  Misses       1714     1714           

@Alputer Alputer changed the title Dask integration feat(dask): Dask integration (#821) Aug 13, 2024
@Alputer Alputer force-pushed the dask-integration branch 2 times, most recently from bd35dcc to aed3588 Compare August 28, 2024 15:24
Alputer added a commit to Alputer/reana that referenced this pull request Aug 28, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Aug 28, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Aug 29, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 5, 2024
@@ -22,6 +22,16 @@ rules:
- apiGroups: ["metrics.k8s.io"]
resources: ["pods", "nodes"]
verbs: ["get", "list", "watch"]
# Custom dask kubernetes resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to wrap the Dask stuff into a conditional variable such as dask.enabled -- similarly as we have traefik.enabled -- because some REANA deployments may not want to (or may not be allowed to) support Dask operators in their deployments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when dask.enabled is true, it would be good to deploy Dask operator automatically, in quite the same way as we do for traefik; see the dependencies in Chart.yaml.

Alputer added a commit to Alputer/reana that referenced this pull request Sep 9, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 9, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 9, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 9, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 10, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 10, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 10, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 10, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 10, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 15, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Sep 20, 2024
@@ -60,6 +60,13 @@ This Helm automatically prefixes all names using the release name to avoid colli
| `components.reana_workflow_engine_snakemake.environment` | [REANA-Workflow-Engine-Snakemake](https://github.com/reanahub/reana-workflow-engine-snakemake) environment variables | `{}` |
| `components.reana_workflow_engine_snakemake.image` | [REANA-Workflow-Engine-Snakemake image](https://hub.docker.com/r/reanahub/reana-workflow-engine-snakemake) to use | `docker.io/reanahub/reana-workflow-engine-snakemake:<chart-release-version>` |
| `compute_backends` | List of supported compute backends (kubernetes, htcondorcern, slurmcern) | "kubernetes" |
| `dask.enabled` | Install dask-kubernetes-operator custom resources in the cluster to enable dask workflows | true |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `dask.enabled` | Install dask-kubernetes-operator custom resources in the cluster to enable dask workflows | true |
| `dask.enabled` | Install dask-kubernetes-operator custom resources in the cluster to support Dask workflows? | false |
  • minor verbiage change;
  • dunno if we want to keep it on true yet; better to let it on false until the feature is stable.

@@ -60,6 +60,13 @@ This Helm automatically prefixes all names using the release name to avoid colli
| `components.reana_workflow_engine_snakemake.environment` | [REANA-Workflow-Engine-Snakemake](https://github.com/reanahub/reana-workflow-engine-snakemake) environment variables | `{}` |
| `components.reana_workflow_engine_snakemake.image` | [REANA-Workflow-Engine-Snakemake image](https://hub.docker.com/r/reanahub/reana-workflow-engine-snakemake) to use | `docker.io/reanahub/reana-workflow-engine-snakemake:<chart-release-version>` |
| `compute_backends` | List of supported compute backends (kubernetes, htcondorcern, slurmcern) | "kubernetes" |
| `dask.enabled` | Install dask-kubernetes-operator custom resources in the cluster to enable dask workflows | true |
| `dask.cluster_default_cores_limit` | Default cores limit for dask clusters | 4 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetics: s/dask/Dask/g here and elsewhere, the term is usually written with capital D.

@@ -60,6 +60,13 @@ This Helm automatically prefixes all names using the release name to avoid colli
| `components.reana_workflow_engine_snakemake.environment` | [REANA-Workflow-Engine-Snakemake](https://github.com/reanahub/reana-workflow-engine-snakemake) environment variables | `{}` |
| `components.reana_workflow_engine_snakemake.image` | [REANA-Workflow-Engine-Snakemake image](https://hub.docker.com/r/reanahub/reana-workflow-engine-snakemake) to use | `docker.io/reanahub/reana-workflow-engine-snakemake:<chart-release-version>` |
| `compute_backends` | List of supported compute backends (kubernetes, htcondorcern, slurmcern) | "kubernetes" |
| `dask.enabled` | Install dask-kubernetes-operator custom resources in the cluster to enable dask workflows | true |
| `dask.cluster_default_cores_limit` | Default cores limit for dask clusters | 4 |
| `dask.cluster_max_cores_limit` | Max cores limit for dask clusters | 16 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to better distinguish between (individual) worker-based specs and (overall) cluster-based specs, what about renaming the values to something like:

  • dask.cluster_cores
  • dask.cluster_memory
  • dask.worker_cores
  • dask.worker_memory

The similar terms would be closer in the naming, and the fact that some of these are "defaults" we can express in the documentation verbiage.

(Dunno about the word "limit" that we could perhaps leave out or perhaps include, e.g. worker_cores_limit.)

And, if we expose all these for the user to possibly modify in their reana.yaml files, then we'd need the user limits for all these:

  • dask.cluster_cores_max_user_limit
  • dask.cluster_memory_max_user_limit
  • dask.worker_cores_max_user_limit
  • dask.worker_memory_max_user_limit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example: a user asks for:

  resources:
    dask:
      image: coffeateam/coffea-dask-almalinux8
      number_of_workers: 3
      single_worker_memory: "3Gi"

We would need just two max_user_limit values for these, whilst the admin will set desired default values in values.yaml.

(And if we don't need to specify cores, and Dask uses "auto" detect or some such, then we don't even have to specify them.)

Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 16, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Alputer added a commit to Alputer/reana that referenced this pull request Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants