This REANA reproducible analysis example demonstrates the reconstruction procedure of the
CMS collaboration from
raw data to
Analysis Object Data (AOD),
for the year 2011
and the data set DoubleElectron
.
The workflow consists of the steps need for the samples reconstruction, as taken from the CMS legacy validation repo.
Any raw input data from the
CERN open data platform
should be valid for reconstruction. In this example, the input is taken from:
root://eospublic.cern.ch//eos/opendata/cms/Run2011A/DoubleElectron/RAW/v1/000/160/433/C046161E-0D4E-E011-BCBA-0030487CD906.root
The reconstruction step can be repeated with a configuration file that depends on the analyzed data, e.g. this example, or by creating our own configuration file (created in a CMS VM) and then changing the script accordingly:
cmsDriver.py reco -s RAW2DIGI,L1Reco,RECO,USER:EventFilter/HcalRawToDigi/hcallaserhbhehffilter2012_cff.hcallLaser2012Filter --data --conditions FT_R_53_LV5::All --eventcontent AOD --customise Configuration/DataProcessing/RecoTLR.customisePrompt --no_exec --python reco_cmsdriver2011.py
In order to be able to rerun the analysis even several years in the future, we need to "encapsulate the current compute environment", for example to freeze the software package versions our analysis is using. We shall achieve this by preparing a Docker container image for our analysis steps.
This analysis example runs within the CMSSW analysis framework that was packaged for Docker in cmsopendata. The different images corresponds to data sets taken in different years. Instructions can be found under this repo.
Moreover, the re-reconstruction task needs access run-time to the condition database and inside a CMS VM, this is achieved with the commands:
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA FT_53_LV5_AN1
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db FT_53_LV5_AN1_RUNA.db
For REANA, the condition database on CVMFS can be accessed with any container, the only
requirement is that the user should specify the necessary CVMFS volumes to be
live-mounted in the reana.yaml
resource section, as described
here.
First, we have to set up the environment variables accordingly for the CMS SW. Although this is done in the docker image, REANA overrides them and they need to be reset. This is done by copying the cms entrypoint.sh script:
$ source /opt/cms/cmsset_default.sh
$ scramv1 project CMSSW CMSSW_5_3_32
$ cd CMSSW_5_3_32/src
$ eval `scramv1 runtime -sh`
The actual commands that are needed to carry out the analysis in the CMS specific environment are then:
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA FT_53_LV5_AN1
$ ln -sf /cvmfs/cms-opendata-conddb.cern.ch/FT_53_LV5_AN1_RUNA.db FT_53_LV5_AN1_RUNA.db
$ ls -l
$ ls -l /cvmfs/
$ cmsRun reco_cmsdriver2011.py
This demo represents a "workflow factory" script that will produce REANA workflows for given parameters for the CMS RAW to AOD reconstruction procedure.
Following successful tests (see other branches), we know that REANA is able to run CMS reconstruction for a variety of RAW samples (e.g. dataset SingleMu) and data-taking years (e.g. 2011).
Before running example, you might want to install necessary packages:
$ # create new virtual environment
$ virtualenv ~/.virtualenvs/myreana
$ source ~/.virtualenvs/myreana/bin/activate
$ # install reana-commons and reana-client
$ pip install git+git://github.com/reanahub/reana-demo-cms-reco.git@master#egg=cms-reco
After, the following will generate the workflow to run the example for a given record id,
with its metadata retrieved using the
COD Client. This generates a
workflow in a given output directory, where the reana.yaml
file lives with all
necessary inputs:
$ cernopendata-client get-record --recid 39 | tee cms-reco-config.json
# # use the values from the 'cms-reco-config.json' file
$ cms-reco --create-workflow
Created `cms-reco-SingleElectron-2011` directory.
$ cd cms-reco-SingleElectron-2011
$ reana-client run