Skip to content

Latest commit

 

History

History
249 lines (163 loc) · 14.3 KB

README.md

File metadata and controls

249 lines (163 loc) · 14.3 KB

nanoAOD-tools Custom

Tools for working with NanoAOD (requiring only python + root, not CMSSW)

Set up CMSSW (w PY3), NanoAOD-tools and NanoNN

Python3 is needed to re-run the taggers w/ ONNXRuntime. NanoNN is needed for taggers/regression and PF inputs. It also contains the module for the hh4b analysis selection.

cd CMSSW_11_1_0_pre5_PY3/src
git clone [email protected]:mstamenk/nanoAOD-tools.git PhysicsTools/NanoAODTools
git clone [email protected]:mstamenk/NanoNN.git PhysicsTools/NanoNN
cd PhysicsTools/NanoAODTools
cmsenv
scram b -j 10

HHH6b producer

To run the HHH6b producer, copy one of the HHH NanoAOD v9 file from lxplus:

/afs/cern.ch/work/m/mstamenk/public/HHH/HHH6b_RunIISummer20UL17/

To run the producer:

python scripts/nano_postproc.py tmp RunIISummer20UL17NANOAODSIM_1.root -I PhysicsTools.NanoNN.producers.hhh6bProducer hhh6bProducerFromConfig -N 500 --bo scripts/branch_hhh6b_output.txt

Samples path

For the official CMS samples, the samples are stored on /store/ of CMS and don't need to be copied locally. This concerning data (JetHT), and QCD, V+jets, VV, VVV, TT.

For the signal samples, the following path can be used:

/eos/user/m/mstamenk/CxAOD31run/hhh-samples/HHH6b_RunIISummer20UL17 # 250k events (used by me so far)
/eos/user/m/mstamenk/CxAOD31run/hhh-6b/run_hhh6b # 10 million events - to be tested for 2016, 2017 and 2018

For local tests using 500 events, copy one sample locally and pass it directly to the framework.

For running on condor with the full production, the samples are given to the framework through a list:

NanoAODTools/condor/samples/hhh6b_2017_DATA.yaml
NanoAODTools/condor/samples/hhh6b_2017_MC.yaml
NanoAODTools/condor/samples/hhh6b_2017_signalMC.yaml
NanoAODTools/condor/samples/xSections.dat # with corresponding cross-sections

These lists need to be updated for 2016 and 2018.

In addition to the config files (yaml) for the samples in the framework, a list of samples path needs to be created and provided with the correct name matching the yaml:

NanoAODTools/condor/list/nano/v9/2017

This list of files can be created using the following scripts for official and private MC productions (scripts need to be updated if using lxplus):

NanoAODTools/condor/fileset/fileset_nanoaodv9.py
NanoAODTools/condor/fileset/fileset_eos_hhh.py
NanoAODTools/condor/fileset/fileset_signal_HHH.py
NanoAODTools/condor/fileset/fileset_qcd6b_HHH.py

Launching full production on condor

To simplify the launching of the production, following scripts are avaialble:

source launch_data.sh
source launch_signal.sh 
source launch_qcd6b.sh

Modify commands to increase or decrease number of jobs to run. Currently runs about 2000 jobs per year.

Once the samples are done, the samples needs to be postprocessed to add the MC weightand merge them into a single file. Use the following script:

source post_process_data.sh
source post_process_signal.sh  
source post_process_qcd6b.sh

The output paths need to be modified accordingly to write on your private repository.

Analysis definition

The analysis is defined in:

NanoNN/python/producers/hhh6bProducer.py

This is where all the main higgs bosons variables are built and defined. This is also where the reocnstruction is done, the pairing of the jets and all the event related variables are built.

In order to run on condor, don't forget to tar the CMSSW project. Only the version in the tar ball is used on the condor batch, the path to it needs to be modified (see instructions below in the README):

cd $CMSSW_BASE/../;
tar -zvcf CMSSW_11_1_0_pre5_PY3.tgz CMSSW_11_1_0_pre5_PY3 --exclude="*.pdf" --exclude="*.pyc" --exclude=tmp --exclude-vcs --exclude-caches-all --exclude="*err*" --exclude=*out_* --exclude=condor --exclude=.git --exclude=hhh-bdt --exclude=plottting

HH4b producer

Testing the post-processing step locally

The instructions to run the usual NanoAODTools post-processing step can be found in the nanoAOD-tools repo.

In our case we use e.g. the hh4bProducer. To test it locally you can use:

python scripts/nano_postproc.py tmp/ /eos/uscms//store/group/lpcdihiggsboost/NanoTuples/V2p0//MC_Autumn18/v1/GluGluToHHTo4B_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/NanoTuples-V2p0_RunIIAutumn18MiniAOD-102X_v15-v1/200801_231026/0000/nano_1.root -I PhysicsTools.NanoNN.producers.hh4bProducer hh4bProducer_2017 --cut "(FatJet_pt>250)" -N 1000 --bo scripts/branch_hh4b_output.txt

or with an ``hh_cfg.json` in the same directory you can use:

python scripts/nano_postproc.py tmp/ /eos/uscms//store/group/lpcdihiggsboost/NanoTuples/V2p0//MC_Autumn18/v1/GluGluToHHTo4B_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/NanoTuples-V2p0_RunIIAutumn18MiniAOD-102X_v15-v1/200801_231026/0000/nano_1.root -I PhysicsTools.NanoNN.producers.hh4bProducer hh4bProducerFromConfig -N 50 --bo scripts/branch_hh4b_output.txt

Here:

  • tmp is the output directory
  • root://cmseos.fnal.gov//store/group/lpcdihiggsboost/NanoTuples/V2p0/MC_Fall17/v1/GluGluToHHTo4B_node_cHHH1_TuneCP5_PSWeights_13TeV-powheg-pythia8/NanoTuples-V2p0_RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_v14-v1/200801_230741/0000/nano_16.root is the input file
  • -I PhysicsTools.NanoNN.producers.hh4bProducer hh4bProducer_2017 is the module and function to run
  • the -c,--cut option is used to pass a string expression (using the same syntax as in TTree::Draw) that will be used to select events.
  • the -J,--json option is used to pass the name of a JSON file that will be used to select events. It is used for data events.
  • the -N option is selecting only 1000 events for this test.
  • --bi and --bo allows to specify the keep/drop file separately for input and output trees. For hh4b we use these output branches

Scripts to create jobs

Go to the condor directory:

cd Physics/NanoAODTools/condor

All the samples are listed in the samples directory in yaml files that point to list of files.

The main script to produce condor jobs (and later submit them), is (runPostProcessing.py)[https://github.com/cmantill/nanoAOD-tools/blob/master/condor/runPostProcessing.py], e.g.:

python runPostProcessing.py [-i /path/of/input] -o /path/to/output -d datasets.yaml -I PhysicsTools.NanoNN.producers.hh4bProducer hh4bProducer_2017 -n 1

However, the runHH4b.py script allows to input some fixed options for the HH4b analysis.

Inside runHH4b.py you can specify the samples you want to run for each year here. Or you can keep samples=None to run over all the samples listed over --sample-dir (by default samples/).

To run, and create jobs:

python runHH4b.py --option OPTION -o EOSOUTPUTDIR --year YEAR

Here:

  • --option is equivalent to the selection option in the HHBoostedAnalyzer. Although for now only option=5 (signal region) has been implemented.
  • -o is the output directory in eos.
  • --year is the sample year.

Preparing to run jobs

First, you need to re-tar the CMSSW environment (this needs to be re-done if you modify the producer or any files):

cd $CMSSW_BASE/../
tar -zvcf CMSSW_11_1_0_pre5_PY3.tgz CMSSW_11_1_0_pre5_PY3 --exclude="*.pdf" --exclude="*.pyc" --exclude=tmp --exclude-vcs --exclude-caches-all --exclude="*err*" --exclude=*out_* --exclude=condor --exclude=.git

and then copy to your eos directory (change your username here):

mv CMSSW_11_1_0_pre5_PY3.tgz /eos/uscms/store/user/$USER/

You will also need to change the condor script that points to this tar in run_processor.sh.

Running jobs

Once you have made these changes you can run runHH4b.py. For example, for the year 2018:

python runHH4b.py --option 5 -o  /eos/uscms/store/user/cmantill/analyzer/v0 --year 2018

which will create a metadata json file in jobs_v0_ak8_option5_2018/mc/metadata.json and tell you the command to submit the condor jobs:

condor_submit jobs_v0_ak8_option5_2018/mc/submit.cmd

Command line options:

  • the preselection for each option is coded in runHH4b.py.
  • add --run-data to make data trees
  • can run data & MC for multiple years together w/ e.g., --year 2016,2017,2018. The --run-data option will be ignored in this case. Add also --run-syst to make the systematic trees. (TODO)
  • use --sample-dir to specify the directory containing the sample lists. The main one is running over the HH4b NanoAOD datasets listed in lists.
  • the --batch option will submit jobs to condor automatically without confirmation
  • remove -i to run over remote files (e.g., official NanoAOD, or private NanoAOD published on DAS); consider adding --prefetch to copy files first before running
  • add --run-mass-regression to run new ParticleNet mass regression on-the-fly.

e.g. to submit data:

python runHH4b.py --option 5 -o /eos/uscms/store/user/cmantill/analyzer/v0 --year 2018 --run-data -n 10

Adding and Re-weighting samples

The --post option will hadd the output of the condor jobs into OUTPUTDIR/pieces/ and add the weight branch (computed with the sum of genWeights) to the tree.

python runHH4b.py --option 5 -o /eos/uscms/store/user/cmantill/analyzer/v0 --year 2018 --post

Producing training samples

Preparing to run jobs

First, you need to re-tar the CMSSW environment (this needs to be re-done if you modify the producer in NanoNN or add any files):

cd $CMSSW_BASE/../
tar -zvcf CMSSW_11_1_0_pre5_PY3.tgz CMSSW_11_1_0_pre5_PY3 --exclude="*.pdf" --exclude="*.pyc" --exclude=tmp --exclude-vcs --exclude-caches-all --exclude="*err*" --exclude=*out_* --exclude=condor --exclude=".tgz" --exclude=".tar.gz"

and then copy to your eos directory (change your username here):

mv CMSSW_11_1_0_pre5_PY3.tgz /eos/uscms/store/user/$USER/

You will also need to change the condor script that points to this tar in run_skim_input.sh. While you are there make sure you change the output directory to your username.

Testing locally:

For AK8:

mkdir tmp/
python scripts/nano_postproc_custom.py tmp/  /eos/uscms/store/user/lpcdihiggsboost/cmantill/PFNano/2017_preUL/GluGluZH_HToWW_M125_13TeV_powheg_pythia8_TuneCP5/RunIIFall17Jan22-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/210202_002923/0000/nano_mc2017_1.root  -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK8 --cut "(FatJet_pt>300)&&(FatJet_msoftdrop>20)" --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 50000

python scripts/nano_postproc_custom.py tmp/ /eos/uscms/store/user/lpcdihiggsboost/cmantill/PFNano/2017_preUL_private/GravitonToHHToWWWW/apresyan-crab_PrivateProduction_Fall17_DR_step3_GravitonToHHToWWWW_batch1_v2-5f646ecd4e1c7a39ab0ed099ff55ceb9_Jan22/210202_164913/0000/nano_mc2017_93.root -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK8  --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 10

python scripts/nano_postproc_custom.py tmp/ /eos/uscms/store/user/lpcpfnano/cmantill/v2_2/2017/HWW/GluGluHToWWToLNuQQ_M125_TuneCP5_PSweight_13TeV-powheg2-jhugen727-pythia8/GluGluHToWWToLNuQQ/211115_173633/0000/nano_mc2017_1-1.root -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK8 --cut "(FatJet_pt>300)&&(FatJet_msoftdrop>20)" --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 50000

python scripts/nano_postproc_custom.py tmp/ /eos/uscms/store/user/lpcpfnano/cmantill/v2_2/2017/HWW/GluGluHToWWToLNuQQ_M125_TuneCP5_PSweight_13TeV-powheg2-jhugen727-pythia8/GluGluHToWWToLNuQQ/211115_173633/0000/nano_mc2017_1-1.root -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK8_PFNano --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 50000

For AK15:

python scripts/nano_postproc_custom.py tmp/ /eos/uscms/store/user/lpcdihiggsboost/cmantill/PFNano/2017_preUL_private_ak15/GravitonToHHToWWWW/apresyan-crab_PrivateProduction_Fall17_DR_step3_GravitonToHHToWWWW_batch1_v2-5f646ecd4e1c7a39ab0ed099ff55ceb9_Mar16/210317_160124/0000/nano_mc2017_1.root  -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK15  --cut "(FatJetAK15_pt>300)&&(FatJetAK15_msoftdrop>20)" --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 50000

python scripts/nano_postproc_custom.py tmp/ /eos/uscms/store/user/lpcpfnano/cmantill/v2_2/2017/HWW/GluGluHToWWToLNuQQ_M125_TuneCP5_PSweight_13TeV-powheg2-jhugen727-pythia8/GluGluHToWWToLNuQQ/211115_173633/0000/nano_mc2017_1-1.root -I PhysicsTools.NanoNN.producers.inputProducer inputProducer_AK15_PFNano --cut "(FatJetAK15_pt>300)&&(FatJetAK15_msoftdrop>20)" --bi scripts/branch_inputs.txt --bo scripts/branch_inputs_output.txt --perJet -N 50000

Running jobs

Run:

python runSkim.py --tag $TAG --jet $JET_TYPE 

where:

  • $TAG is the tag name for the output directory, e.g. ak8_v01hww_30Apr21
  • $JET is the type of jet, by default is AK8
  • --test allows you run test jobs in condor (recommended)

Running hh4bWWProducer

To test locally:

python scripts/nano_postproc.py tmp/ /eos/uscms/store/user/lpcdihiggsboost/cmantill/PFNano/2017_preUL_private_ak15/HHToBBVVToBBQQQQ_cHHH1/apresyan-crab_PrivateProduction_Fall17_DR_step3_HHToBBVVToBBQQQQ_cHHH1_batch2_v1-5f646ecd4e1c7a39ab0ed099ff55ceb9_Mar16/210317_160337/0000/nano_mc2017_7.root -I PhysicsTools.NanoNN.producers.hhbbWWProducer hhbbWWProducer --bo scripts/branch_hh4b_output.txt ```

Running jobs:

cd condor/
python runHHbbWW.py --option 1 -o  /eos/uscms/store/user/cmantill/analyzer/v0bbWW --year 2017 # (for mc)
python runHHbbWW.py --option 1 -o  /eos/uscms/store/user/cmantill/analyzer/v0bbWW --year 2017 --run-signal # (for signal)
python runHHbbWW.py --option 1 -o  /eos/uscms/store/user/cmantill/analyzer/v0bbWW --year 2017	--run-data # (for data)

Make sure to re-tar the directory and copy to your eos space if there are any changes.

You will also need to change the condor script that points to this tar in run_processor.sh.