feat(analysis): add initial version (#1) #1

Alputer · 2024-09-18T15:25:02Z

This pull request implements the initial version of the dask demo example.

tiborsimko · 2024-10-31T14:27:49Z

reana.yaml

+version: 0.9.3
+inputs:
+  files:
+    - codes/analysis.py


The directory should be named just code.

But, considering that Dask provides also workflow, so to speak, and that we don't have any other input files or data files, we could simply hos the sole analysis.py file in the root directory.

tiborsimko · 2024-10-31T14:28:39Z

reana.yaml

@@ -0,0 +1,18 @@
+version: 0.9.3


You can remove version clause that do not really serve anything and some people were confused by its meaning. (We might be removing it from everywhere later.)

tiborsimko · 2024-10-31T14:30:30Z

reana.yaml

+      image: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
+  specification:
+    steps:
+      - name: mystep


You can name the step "process".

tiborsimko · 2024-10-31T14:31:37Z

reana.yaml

+        environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
+        commands:
+        - python codes/analysis.py
+outputs:


Please introduce also some behavioural tests for the histogram output file presence and for log messages.

all events 53446198 number of chunks 534

tiborsimko · 2024-10-31T14:32:02Z

README.md

+expected outputs:
+
+```yaml
+version: 0.9.3


After you modify reana.yaml, please update this section accordingly.

tiborsimko · 2024-11-13T13:30:16Z

README.md

@@ -4,3 +4,114 @@
 [![image](https://img.shields.io/badge/discourse-forum-blue.svg)](https://forum.reana.io)
 [![image](https://img.shields.io/github/license/reanahub/reana-demo-dask-coffea.svg)](https://github.com/reanahub/reana-demo-dask-coffea/blob/master/LICENSE)
 [![image](https://www.reana.io/static/img/badges/launch-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https%3A%2F%2Fgithub.com%2Freanahub%2Freana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea)
+


Here are suggestions for the README file:

diff --git a/README.md b/README.md index 9ac7e7a..4b329fc 100644 --- a/README.md +++ b/README.md @@ -7,57 +7,73 @@ ## About -## Analysis structure +This [REANA](http://www.reana.io/) reproducible analysis example provides a +simple example how to run Dask workflows using Coffea. The example was adapted +from +[Coffea Casa tutorials](https://github.com/CoffeaTeam/coffea-casa-tutorials/blob/master/examples/example1.ipynb) +repository. -Making a research data analysis reproducible basically means to provide "runnable -recipes" addressing (1) where is the input data, (2) what software was used to analyse -the data, (3) which computing environments were used to run the software and (4) which -computational workflow steps were taken to run the analysis. This will permit to -instantiate the analysis on the computational cloud and run the analysis to obtain (5) -output results. +## Analysis structure +Making a research data analysis reproducible basically means to provide +"runnable recipes" addressing (1) where is the input data, (2) what software was +used to analyse the data, (3) which computing environments were used to run the +software and (4) which computational workflow steps were taken to run the +analysis. This will permit to instantiate the analysis on the computational +cloud and run the analysis to obtain (5) output results. ### 1. Input data -In this example, we are using the file whose url is given below which is hosted at eospublic. -- `root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root` +In this example, we are using a single CMS open data set file +`Run2012B_SingleMu.root` which is hosted at EOSPUBLIC XRootD server. ### 2. Analysis code -The analysis code consists of a single python file called `analysis.py` which connects to a dask cluster and then conducts the analysis. +The analysis code consists of a single Python file called `analysis.py` which +connects to a Dask cluster and then conducts the analysis and prints MET +histogram. ### 3. Compute environment -In order to be able to rerun the analysis even several years in the future, we need to "encapsulate the current compute environment". We shall achieve this by preparing a [Docker](https://www.docker.com/) container image for our analysis steps. +In order to be able to rerun the analysis even several years in the future, we +need to "encapsulate the current compute environment". We shall achieve this by +preparing a [Docker](https://www.docker.com/) container image for our analysis +steps. -This example makes use of the coffea platform and the specific image for the platform we are using in this example can be found [here](https://hub.docker.com/r/coffeateam/coffea-dask-cc7). +This example makes use of the Coffea platform image with the specific version +0.7.22. The container image can be found on Docker Hub at +[docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049](https://hub.docker.com/r/coffeateam/coffea-dask-cc7). ### 4. Analysis workflow -The analysis workflow is simple and consists of a single step. We simply run the script `python analysis.py` to run the example. However, realize that the actual analysis is relatively heavy and parallelized by dask behind the scenes. As a user, the task graphs and the parallel steps are hidden to us. +The analysis workflow is simple and consists of a single command. We simply run +the script `python analysis.py` to run the example. The command will then use +the Dask behind the scenes to possibly launch parallel computations. As a user, +we do not have to specify the computational graph ourselves; the Dask library +will take care of dispatching computations. ### 5. Output results -The example produces the given histogram as an output. -![](https://github.com/user-attachments/assets/e52c2391-626d-4556-90ca-75248516cc95) +The example produces the following MET event-level histogram as an output. +![](https://github.com/user-attachments/assets/e52c2391-626d-4556-90ca-75248516cc95) ## Running the example on REANA cloud There are two ways to execute this analysis example on REANA. -If you would like to simply launch this analysis example on the REANA instance at CERN -and inspect its results using the web interface, please click on the following -badge: +If you would like to simply launch this analysis example on the REANA instance +at CERN and inspect its results using the web interface, please click on the +following badge: -[![Launch with Serial on REANA@CERN badge](https://www.reana.io/static/img/badges/launch-with-serial-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https://github.com/reanahub/reana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea) +[![Launch on REANA@CERN badge](https://www.reana.io/static/img/badges/launch-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https://github.com/reanahub/reana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea) -If you would like a step-by-step guide on how to use the REANA command-line client to -launch this analysis example, please read on. +If you would like a step-by-step guide on how to use the REANA command-line +client to launch this analysis example, please read on. -We start by creating a [reana.yaml](reana.yaml) file describing the above analysis -structure with its inputs, code, runtime environment, computational workflow steps and -expected outputs: +We start by creating a [reana.yaml](reana.yaml) file describing the above +analysis structure with its inputs, code, runtime environment, computational +workflow steps and expected outputs: ```yaml inputs: @@ -73,7 +89,7 @@ workflow: - name: process environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049 commands: - - python analysis.py + - python analysis.py outputs: files: - histogram.png @@ -83,11 +99,11 @@ tests: - tests/workspace-files.feature

-In this example we are using a simple Serial workflow engine to represent our sequential
-computational workflow steps.
+In this example we are using a simple Serial workflow engine to launch our
+Dask-based computations.

-We can now install the REANA command-line client, run the analysis and download the
-resulting plots:
+We can now install the REANA command-line client, run the analysis and download
+the resulting plots:

$ # create new virtual environment @@ -113,5 +129,6 @@ $ # download output results $ reana-client download

-Please see the REANA-Client documentation for
-more detailed explanation of typical reana-client usage scenarios.
\ No newline at end of file
+Please see the REANA-Client
+documentation for more detailed explanation of typical reana-client usage
+scenarios.

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 19, 2024

ci(code): upload initial version (reanahub#1)

f5f8df9

Alputer force-pushed the initial-version branch from 29dd98c to f5f8df9 Compare September 19, 2024 08:01

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 19, 2024

ci(code): upload initial version (reanahub#1)

29dd98c

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 25, 2024

ci(code): upload initial version (reanahub#1)

0f1eca9

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 25, 2024

ci(code): upload initial version (reanahub#1)

730d702

Alputer force-pushed the initial-version branch from f5f8df9 to 730d702 Compare September 25, 2024 12:28

tiborsimko assigned tiborsimko and Alputer Oct 29, 2024

tiborsimko reviewed Oct 31, 2024

View reviewed changes

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Oct 31, 2024

ci(code): upload initial version (reanahub#1)

845ec04

Alputer force-pushed the initial-version branch from 730d702 to 845ec04 Compare October 31, 2024 15:44

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Oct 31, 2024

ci(code): upload initial version (reanahub#1)

faacaad

Alputer force-pushed the initial-version branch from 845ec04 to faacaad Compare October 31, 2024 15:50

tiborsimko reviewed Nov 13, 2024

View reviewed changes

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

9339c1c

Alputer force-pushed the initial-version branch from faacaad to 9339c1c Compare November 13, 2024 13:40

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

386c896

Alputer force-pushed the initial-version branch from 9339c1c to 386c896 Compare November 13, 2024 13:43

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

5a5bffc

Alputer force-pushed the initial-version branch from 386c896 to 5a5bffc Compare November 13, 2024 13:54

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

eca77b8

Alputer force-pushed the initial-version branch from 5a5bffc to eca77b8 Compare November 13, 2024 14:04

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

40d3344

Alputer force-pushed the initial-version branch from eca77b8 to 40d3344 Compare November 13, 2024 15:07

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

3965f04

Alputer force-pushed the initial-version branch from 40d3344 to 3965f04 Compare November 13, 2024 15:07

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024

ci(code): upload initial version (reanahub#1)

53cf1ba

Alputer force-pushed the initial-version branch from 3965f04 to 53cf1ba Compare November 13, 2024 15:08

feat(analysis): add initial version (reanahub#1)

9a07769

Alputer force-pushed the initial-version branch from 53cf1ba to 9a07769 Compare November 13, 2024 15:13

tiborsimko changed the title ~~ci(code): upload initial version (#1)~~ feat(analysis): add initial version (#1) Nov 13, 2024

tiborsimko approved these changes Nov 13, 2024

View reviewed changes

tiborsimko merged commit 9a07769 into reanahub:master Nov 13, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(analysis): add initial version (#1) #1

feat(analysis): add initial version (#1) #1

Alputer commented Sep 18, 2024

tiborsimko Oct 31, 2024

tiborsimko Oct 31, 2024

tiborsimko Oct 31, 2024

tiborsimko Oct 31, 2024

tiborsimko Oct 31, 2024

tiborsimko Nov 13, 2024

feat(analysis): add initial version (#1) #1

feat(analysis): add initial version (#1) #1

Conversation

Alputer commented Sep 18, 2024

tiborsimko Oct 31, 2024

Choose a reason for hiding this comment

tiborsimko Oct 31, 2024

Choose a reason for hiding this comment

tiborsimko Oct 31, 2024

Choose a reason for hiding this comment

tiborsimko Oct 31, 2024

Choose a reason for hiding this comment

tiborsimko Oct 31, 2024

Choose a reason for hiding this comment

tiborsimko Nov 13, 2024

Choose a reason for hiding this comment