Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(analysis): add initial version (#1) #1

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

Alputer
Copy link
Member

@Alputer Alputer commented Sep 18, 2024

This pull request implements the initial version of the dask demo example.

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 19, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 25, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Sep 25, 2024
reana.yaml Outdated
version: 0.9.3
inputs:
files:
- codes/analysis.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory should be named just code.

But, considering that Dask provides also workflow, so to speak, and that we don't have any other input files or data files, we could simply hos the sole analysis.py file in the root directory.

reana.yaml Outdated
@@ -0,0 +1,18 @@
version: 0.9.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove version clause that do not really serve anything and some people were confused by its meaning. (We might be removing it from everywhere later.)

reana.yaml Outdated
image: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
specification:
steps:
- name: mystep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can name the step "process".

environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
commands:
- python codes/analysis.py
outputs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please introduce also some behavioural tests for the histogram output file presence and for log messages.

all events 53446198
number of chunks 534

README.md Outdated
expected outputs:

```yaml
version: 0.9.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you modify reana.yaml, please update this section accordingly.

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Oct 31, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Oct 31, 2024
@@ -4,3 +4,114 @@
[![image](https://img.shields.io/badge/discourse-forum-blue.svg)](https://forum.reana.io)
[![image](https://img.shields.io/github/license/reanahub/reana-demo-dask-coffea.svg)](https://github.com/reanahub/reana-demo-dask-coffea/blob/master/LICENSE)
[![image](https://www.reana.io/static/img/badges/launch-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https%3A%2F%2Fgithub.com%2Freanahub%2Freana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are suggestions for the README file:

diff --git a/README.md b/README.md
index 9ac7e7a..4b329fc 100644
--- a/README.md
+++ b/README.md
@@ -7,57 +7,73 @@
 
 ## About
 
-## Analysis structure
+This [REANA](http://www.reana.io/) reproducible analysis example provides a
+simple example how to run Dask workflows using Coffea. The example was adapted
+from
+[Coffea Casa tutorials](https://github.com/CoffeaTeam/coffea-casa-tutorials/blob/master/examples/example1.ipynb)
+repository.
 
-Making a research data analysis reproducible basically means to provide "runnable
-recipes" addressing (1) where is the input data, (2) what software was used to analyse
-the data, (3) which computing environments were used to run the software and (4) which
-computational workflow steps were taken to run the analysis. This will permit to
-instantiate the analysis on the computational cloud and run the analysis to obtain (5)
-output results.
+## Analysis structure
 
+Making a research data analysis reproducible basically means to provide
+"runnable recipes" addressing (1) where is the input data, (2) what software was
+used to analyse the data, (3) which computing environments were used to run the
+software and (4) which computational workflow steps were taken to run the
+analysis. This will permit to instantiate the analysis on the computational
+cloud and run the analysis to obtain (5) output results.
 
 ### 1. Input data
 
-In this example, we are using the file whose url is given below which is hosted at eospublic.
-- `root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root`
+In this example, we are using a single CMS open data set file
+`Run2012B_SingleMu.root` which is hosted at EOSPUBLIC XRootD server.
 
 ### 2. Analysis code
 
-The analysis code consists of a single python file called `analysis.py` which connects to a dask cluster and then conducts the analysis.
+The analysis code consists of a single Python file called `analysis.py` which
+connects to a Dask cluster and then conducts the analysis and prints MET
+histogram.
 
 ### 3. Compute environment
 
-In order to be able to rerun the analysis even several years in the future, we need to "encapsulate the current compute environment". We shall achieve this by preparing a [Docker](https://www.docker.com/)  container image for our analysis steps.
+In order to be able to rerun the analysis even several years in the future, we
+need to "encapsulate the current compute environment". We shall achieve this by
+preparing a [Docker](https://www.docker.com/) container image for our analysis
+steps.
 
-This example makes use of the coffea platform and the specific image for the platform we are using in this example can be found [here](https://hub.docker.com/r/coffeateam/coffea-dask-cc7).
+This example makes use of the Coffea platform image with the specific version
+0.7.22. The container image can be found on Docker Hub at
+[docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049](https://hub.docker.com/r/coffeateam/coffea-dask-cc7).
 
 ### 4. Analysis workflow
 
-The analysis workflow is simple and consists of a single step. We simply run the script `python analysis.py` to run the example. However, realize that the actual analysis is relatively heavy and parallelized by dask behind the scenes. As a user, the task graphs and the parallel steps are hidden to us.
+The analysis workflow is simple and consists of a single command. We simply run
+the script `python analysis.py` to run the example. The command will then use
+the Dask behind the scenes to possibly launch parallel computations. As a user,
+we do not have to specify the computational graph ourselves; the Dask library
+will take care of dispatching computations.
 
 ### 5. Output results
 
-The example produces the given histogram as an output.
-![](https://github.com/user-attachments/assets/e52c2391-626d-4556-90ca-75248516cc95)
+The example produces the following MET event-level histogram as an output.
 
+![](https://github.com/user-attachments/assets/e52c2391-626d-4556-90ca-75248516cc95)
 
 ## Running the example on REANA cloud
 
 There are two ways to execute this analysis example on REANA.
 
-If you would like to simply launch this analysis example on the REANA instance at CERN
-and inspect its results using the web interface, please click on the following
-badge:
+If you would like to simply launch this analysis example on the REANA instance
+at CERN and inspect its results using the web interface, please click on the
+following badge:
 
-[![Launch with Serial on REANA@CERN badge](https://www.reana.io/static/img/badges/launch-with-serial-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https://github.com/reanahub/reana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea)
+[![Launch on REANA@CERN badge](https://www.reana.io/static/img/badges/launch-on-reana-at-cern.svg)](https://reana.cern.ch/launch?url=https://github.com/reanahub/reana-demo-dask-coffea&specification=reana.yaml&name=reana-demo-dask-coffea)
 
-If you would like a step-by-step guide on how to use the REANA command-line client to
-launch this analysis example, please read on.
+If you would like a step-by-step guide on how to use the REANA command-line
+client to launch this analysis example, please read on.
 
-We start by creating a [reana.yaml](reana.yaml) file describing the above analysis
-structure with its inputs, code, runtime environment, computational workflow steps and
-expected outputs:
+We start by creating a [reana.yaml](reana.yaml) file describing the above
+analysis structure with its inputs, code, runtime environment, computational
+workflow steps and expected outputs:
 
 ```yaml
 inputs:
@@ -73,7 +89,7 @@ workflow:
       - name: process
         environment: docker.io/coffeateam/coffea-dask-cc7:0.7.22-py3.10-g7f049
         commands:
-        - python analysis.py
+          - python analysis.py
 outputs:
   files:
     - histogram.png
@@ -83,11 +99,11 @@ tests:
     - tests/workspace-files.feature

-In this example we are using a simple Serial workflow engine to represent our sequential
-computational workflow steps.
+In this example we are using a simple Serial workflow engine to launch our
+Dask-based computations.

-We can now install the REANA command-line client, run the analysis and download the
-resulting plots:
+We can now install the REANA command-line client, run the analysis and download
+the resulting plots:

$ # create new virtual environment
@@ -113,5 +129,6 @@ $ # download output results
$ reana-client download

-Please see the REANA-Client documentation for
-more detailed explanation of typical reana-client usage scenarios.
\ No newline at end of file
+Please see the REANA-Client
+documentation for more detailed explanation of typical reana-client usage
+scenarios.

Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
Alputer added a commit to Alputer/reana-demo-dask-coffea that referenced this pull request Nov 13, 2024
@tiborsimko tiborsimko changed the title ci(code): upload initial version (#1) feat(analysis): add initial version (#1) Nov 13, 2024
@tiborsimko tiborsimko merged commit 9a07769 into reanahub:master Nov 13, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants