A repository that is comprised of user-customized "Analyzer" modules for looping over events and producing histograms for whichever quantities desired. Supporting plotting scripts and tools are provided.
To have easy access to TensorFlow and UpRoot, we need to work in a CMSSW_11_2_0_pre5 release:
export SCRAM_ARCH=slc7_amd64_gcc820
cmsrel CMSSW_11_2_0_pre5
cd CMSSW_11_2_0_pre5/src/
cmsenv
Then, check out the latest tagged version of the top tagger repository.
git clone [email protected]:susy2015/TopTagger.git
cd TopTagger/TopTagger/test
./configure
make -j4
Now also check out our repository if not done already:
cd $CMSSW_BASE/src
git clone [email protected]:StealthStop/Framework.git
git clone -b Stealth [email protected]:susy2015/TopTaggerTools.git
git clone [email protected]:susy2015/NTupleReader.git
git clone [email protected]:StealthStop/Analyzer.git
cd Analyzer/Analyzer/test
source setup.sh #.csh if in tcsh
./configure
make -j4
We set up separate top tagger cfg files for each year, because different b tagger working points (WPs) are used. Now we switch to DeepFlavor for b-tagging and top tagger also implements the medium DeepFlavor working point for each year. So, we have two sets of releases, implementing DeepFlavor WPs. The first set is the normal version, where the merged and resolved top tagger working points are passed to the tagger at run time.
A second set of configs is available. Note that in the second set, we set up both DeepResolved and DeepAK8 WPs as 0.00 to get any top candidates in between WP 0 and 1 for calculating the SFs.
In the Framework/Framework/include/RunTopTagger.h, the resolved and merged working points are explicity applied when counting the number of tops.
cmsenv
getTaggerCfg.sh -t StealthStop_DeepFlavorWp0.2598_DeepResolvedwp0.95_DeepAK8wp0.937_2016preVFPUL -f TopTaggerCfg_2016preVFP.cfg -o
getTaggerCfg.sh -t StealthStop_DeepFlavorWp0.2489_DeepResolvedwp0.95_DeepAK8wp0.937_2016postVFPUL -f TopTaggerCfg_2016postVFP.cfg -o
getTaggerCfg.sh -t StealthStop_DeepFlavorWp0.3040_DeepResolvedwp0.95_DeepAK8wp0.895_2017UL -f TopTaggerCfg_2017.cfg -o
getTaggerCfg.sh -t StealthStop_DeepFlavorWp0.2783_DeepResolvedwp0.95_DeepAK8wp0.895_2018UL -f TopTaggerCfg_2018.cfg -o
We have two set of Double DisCo NN, one for RPV model and another one SYY model. To get all them, run these comments below. Note that any relese with patch number 1 (e.g. v3.0.1) contains optimized bin edges whereas patch number 0 (e.g. v3.0.0) has non-optimized bin edges.
cmsenv
getDeepESMCfg.sh -t DoubleDisCo_Reg_0l_Run2_RPV_v3.4.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_0l_RPV_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_0l_Run2_SYY_v3.4.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_0l_SYY_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_1l_Run2_RPV_v3.4.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_1l_RPV_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_1l_Run2_SYY_v3.4.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_1l_SYY_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_2l_Run2_RPV_v3.4.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_2l_RPV_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_2l_Run2_SYY_v3.5.1_MassExclusion -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_2l_SYY_Run2_MassExclusion
getDeepESMCfg.sh -t DoubleDisCo_Reg_0l_Run2_RPV_v3.4.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_0l_RPV_Run2_MaxSig
getDeepESMCfg.sh -t DoubleDisCo_Reg_0l_Run2_SYY_v3.4.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_0l_SYY_Run2_MaxSig
getDeepESMCfg.sh -t DoubleDisCo_Reg_1l_Run2_RPV_v3.4.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_1l_RPV_Run2_MaxSig
getDeepESMCfg.sh -t DoubleDisCo_Reg_1l_Run2_SYY_v3.4.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_1l_SYY_Run2_MaxSig
getDeepESMCfg.sh -t DoubleDisCo_Reg_2l_Run2_RPV_v3.5.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_2l_RPV_Run2_MaxSig
getDeepESMCfg.sh -t DoubleDisCo_Reg_2l_Run2_SYY_v3.5.1_MaxSig -o -m DoubleDisCo_Reg.cfg -M DoubleDisCo_Reg_NonIsoMuon.cfg -f Keras_Tensorflow -F Keras_Tensorflow_NonIsoMuon -s DoubleDisCo_Reg_2l_SYY_Run2_MaxSig
Analyzer modules are run via the main "controlling" script MyAnalysis.C
, which handles several different arguments.
Options:
-c : runOnCondor An internally specified argument which is used to signify when running on a condor cluster node
-v : isQuiet When specified, print logging information while running (custom to a specific analyzer)
-s : fastMode Run the analyzer in fast mode, where the module pipeline can be terminated early
-A : analyzer The name of the analyzer to be executed
-H : histFile Name of the output ROOT file to contain any histograms created by the analyzer
-D : dataSets Comma-separated list of data set names to run over
-N : nFiles Number of files (per data set) to process
-M : startFile Which file in the data set filelist to start processing at
-E : maxEvts Absolute maximum number of events to process for each data set
An example of running MyAnalysis interactively is
cd $CMSSW_BASE/src/Analyzer/Analyzer/test/
./MyAnalysis -A AnalyzeDoubleDisCo -H myoutputfile.root -D 2017_TTToSemiLeptonic -E 1001 -M 2 -s
The condor
subdirectory contains some scripts to help submit jobs via condor on the cmslpc cluster.
The requirements for condor submission are:
1. A shell script to run on the worker node. This script should set up the working area, copy any needed files, call `MyAnalysis.C` with the right options, and make sure the output gets copied to the user's EOS area.
- The example included here is [run_Analyzer_condor.sh](Analyzer/test/condor/run_Analyzer_condor.sh)
2. One or more tarballs to unpack on the worker node, these usually contain a slimmed down CMSSW area, and the `MyAnalysis` executable with any needed libraries
3. A so-called jdl file that contains the condor setup and specifies the jobs to be submitted
- The last two items are produced by a python script called [condorSubmit.py](Analyzer/test/condor/condorSubmit.py).
An example call to the condor submission script would be:
python condorSubmit.py --analyze AnalyzeDoubleDisCo --output DisCoAnaOutput -d "2016preVFP_TT,2016preVFP_QCD" -n 20
where possible arguments are
Usage: condorSubmit.py [options]
Options:
-h, --help show this help message and exit
-n NUMFILE number of files per job
-d DATASETS List of datasets, comma separated
-l List all datacollections
-L List all datacollections and sub collections
-c Do not submit jobs. Only create condor_submit.txt.
-s Run Analyzer in fast mode
-u USEROVERRIDE Override username with something else
--output=OUTPATH Name of directory where output of each condor job goes
--analyze=ANALYZE AnalyzeBackground, AnalyzeEventSelection, Analyze0Lep,
Analyze1Lep, MakeNJetDists
With the -n
option one can specify how many files to run over per job.
The --analyze
option lets the user pick which analyzer to use.
MyAnalysis incorporates dedicator code modules to keep track of datasets, their cross sections, and their names.
To see a list of available datasets, one can call the submission script with the -l
or -L
options.
Pass the list of datasets desired to run over to the script with the option -d
.
Before submitting jobs, make sure to have called voms-proxy-init
.
In the event that jobs fail and do not send their output to EOS, a script is provided that can resubmit these missing jobs.
All that is required is the original job folder in condor
as well as the corresponding output folder in EOS.
The cleanup script compares the total possible jobs (via .log
files) to the output .root
files to identify jobs that did not complete correctly.
Given the example call to condorSubmit.py
up above, a corresponding call to the cleanup script would be:
python condorSubmit.py --analyze AnalyzeDoubleDisCo --jobdir DisCoAnaOutput
with available options given as
Usage: cleanupSubmit.py [options]
Options:
-h, --help show this help message and exit
-c Do not submit jobs. Only create condor_submit.txt.
-s Run Analyzer in fast mode
-u USEROVERRIDE Override username with something else
--jobdir=JOBDIR Name of directory where output of each condor job goes
--analyze=ANALYZE AnalyzeBackground, AnalyzeEventSelection, Analyze0Lep,
Analyze1Lep, MakeNJetDists
A script that wraps around the TTree->Draw()
concept is provided in the form of miniTupleDrawer.py
.
This gives easy abilities to plot from TTrees produced from an analyzer derived from the MiniTupleMaker
class.
Current analyzers that produce simple TTrees are MakeMiniTree
, MakeNNVariables
, and MakeQCDValTree
.
The TTree drawer script requires a "sidecar" auxiliary file that specifies a dictionary of histogram names mapped to a subdictionary of options.
An example aux file is miniTupleDrawer_aux.py
.
usage: %miniTupleDrawer [options] [-h] --inputDir INPUTDIR
[--outputDir OUTPUTDIR] [--tree TREE]
[--year YEAR] [--options OPTIONS]
optional arguments:
-h, --help show this help message and exit
--inputDir INPUTDIR Path to ntuples
--outputDir OUTPUTDIR
path to output
--tree TREE TTree name to draw
--year YEAR which year
--options OPTIONS options file
An example call to this script would be:
python Plotters/General/miniTupleDrawer.py --options miniTupleDrawer_aux \
--inputDir ~/path/to/minituples/ \
--outputDir subdir/structure/in/condor/folder \
--tree PreSelection \
--year Run2UL
Output ROOT files with the drawn histograms are contained in the outputDir
folder subdirectory structure and placed automatically in the condor
folder.
This placement of the output makes it intuitive to then use the other plotting tools (stackPlotter
mentioned below) for making final, pretty plots.
A generic plotter is available for making stack plots with or without a data/MC ratio panel.
Information on which backgrounds and signals to plot in addition to visualization should be specified
in the stackPlotter_aux.py
file. There too, one specifies which histograms to extract from the ROOT
file and plot.
usage: usage: %stackPlotter [options] [-h] [--noRatio] [--approved]
[--printNEvents] [--normMC2Data]
[--normalize] [--printSign] --inpath
INPATH --outpath OUTPATH --year YEAR
[--options OPTIONS]
optional arguments:
-h, --help show this help message and exit
--noRatio No ratio plot
--approved Plot is approved
--printNEvents Show number of events
--normMC2Data Normalize MC to data
--normalize Normalize all to unity
--printSign Print simple significance
--inpath INPATH Path to root files
--outpath OUTPATH Where to put plots
--year YEAR which year
--options OPTIONS options file
usage: usage: %stackPlotter [options] [-h] [--noRatio] [--approved]
[--printNEvents] [--normMC]
[--printSign] --inpath INPATH --outpath
OUTPATH --year YEAR
An example call to the stack plotter could be:
python stackPlotter.py --year 2016 --inpath ./condor/2016_DisCo_0L_1L_hadd/ --outpath plot_histos --normMC2Data
A tool is provided to load information from ROOT files output by the AnalyzeDoubleDisCo
analyzer and make a table of event yields (and fractional yields) for different backgrounds and signal processes.
The user can choose which channel as well as toggle between QCD CR or not.
The tables are output in standalone .tex
format that can be simply input into another .tex
document.
usage: usage: %tableYields [options] [-h] --channel CHANNEL --inputDir
INPUTDIR --outputDir OUTPUTDIR [--QCDCR]
[--year YEAR]
optional arguments:
-h, --help show this help message and exit
--channel CHANNEL which channel to process
--inputDir INPUTDIR directory for input ROOT
--outputDir OUTPUTDIR
where to put tex table files
--QCDCR do for QCD CR
--year YEAR which year to process
An example call to the tool would be:
python Tools/makeYieldsTables.py --inputDir ~/nobackup/outputsPath/DataVsMC_Run2 --outputDir MyOutput --channel 1l --year Run2
The main script for generating file lists and the corresponding sample set is Tools/makeFilelist.py
.
usage: makeFilelist.py [-h] [--prod PROD] [--tag TAG] [--skim]
optional arguments:
-h, --help show this help message and exit
--prod PROD Unique tag for output
--tag TAG Path to PU file
--skim Make for skim
The --prod
argument refers to the versioning of the folder on EOS that contains ntuple ROOT files for all MC and data samples.
The latest version is V20
and can be found at /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/
.
Running the script without any arguments will generate a folder filelists_Kevin_UL_v2
and
sampleSet_UL_v2.cfg
, where the default --tag="UL_v2"
has been used.
The filelist folder contains a text file for each MC sample (found in the directory of ntuple files), which lists the full paths to all the ntuple ROOT files
for the corresponding sample.
This folder is intended to be placed in the StealthStop
area of the lpcsusystealth
group space on EOS.
The sampleSet_UL_v2.cfg
file contains a mapping between "friendly" sample names and the corresponding text file listing all ntuple ROOT files for the sample.
This config file should be placed in the cfg
area of Framework
.
In order to pick up this new cfg
in the Analyzer
area automatically, the appropriate lines in the getSamplesCfg.sh
script in Framework/scripts
need to be modified and source setup.sh
rerun.
Additionally, a new sampleCollection_UL_v2.cfg
needs to be constructed (easiest by hand), which creates groups of samples that are to be referenced when running analyzers.
Note, when running makeFilelist.py
it is most effective to have an up-to-date TreeMaker
to reference in the script.
This allows population of each sample line with total event numbers, cross sections, k factors.
When specifying the option --skim
, the script is configured to look for a Skims/{2016preVFP,2016postVFP,2017,2018}
folders in the lpcsusystealth/StealthStop
area.
These folders contain ROOT files from the MakeMiniTree
analyzer which contain a smaller size event TTree.
The filelist script handles these differences when looking for skimmed ROOT files, but returns the same sort of outputs as described above.
With a new sampleSets.cfg
symlink in Analyzer/test
pointing to sampleSets_UL_v2.cfg
in Framework
, the number of events in each sample can be measured.
This is useful for verifying exactly how many events are present for a given sample in order to guarantee accurate calculation of an event weight.
nEvt.py
jobs can be submitted with nEvtsCondorSubmit.py
, which will read in sampleSets.cfg
and spawn a job for each sample and loop through all its files.
An output text file is generated in the end and returned to the user which reads total positive and negative events counts for the sample.
Available arguments for the script are provided below.
Usage: nEvtsCondorSubmit.py [options]
Options:
-h, --help show this help message and exit
--noSubmit Do not submit jobs. Only create condor_submit.txt
--outPath=OUTPATH Name of directory where output of each condor job goes
--sampleSets=SAMPLESETS
Sample sets config file
--wildcard=WILDCARD Wildcard expression for picking only some sample sets
An example call to the script would be
python nEvtsCondorSubmit.py --outPath nEvtsOutput --sampleSet sampleSets --wildcard "*2016pre*"
A helper script checkNevents.py
is available to compare the numbers reported in the nEvt.py
job output and the original sampleSets.cfg
,
whose numbers were sourced directly from the TreeMaker
repository.
Discrepancies are printed to screen for investigation.
Additionally, a new sampleSets_new.cfg
is written with the numbers measured by nEvt.py
inserted into the original sampleSets.cfg
.
usage: checkNevents.py [-h] [--sampleSet SAMPLESET] [--nEvtsDir NEVTDIR]
optional arguments:
-h, --help show this help message and exit
--sampleSet SAMPLESET
Path to sample set file
--nEvtsDir NEVTDIR Directory to nEvt output
The UL signal samples are produced where all mass points from 300 to 1400 can appear in a single ROOT file. To restore the behavior from the legacy analysis, where a given ROOT file only contains events for a single mass point some code infrastructure is provided to disentangle the mass points.
Jobs can be submitted to condor to run over ROOT files in a user-specified directory on EOS. The output ROOT files are sent to a user-specified area in their own EOS area.
usage: submitSplitSignal.py [-h] --eosPath EOSPATH [--outPath OUTPATH]
[--era ERA] [--model MODEL]
[--ttreePath TTREEPATH] [--noSubmit]
optional arguments:
-h, --help show this help message and exit
--eosPath EOSPATH Path to files on EOS
--outPath OUTPATH Output path for jobs
--era ERA Era for signal samples
--model MODEL Signal model to split
--ttreePath TTREEPATH
TTree name to read
--noSubmit Do not submit to condor
Companion runSplitSignal.py
and runSplitSignal.sh
are provided to process each file.
A full suite of calls to this script to split our signals would be:
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16 --model RPV
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16APV --model RPV
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL17 --model RPV
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL18 --model RPV
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16 --model StealthSYY
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16APV --model StealthSYY
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL17 --model StealthSYY
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL18 --model StealthSYY
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16 --model StealthSHH
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL16APV --model StealthSHH
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL17 --model StealthSHH
python submitSplitSignal.py --eosPath /eos/uscms/store/user/lpcsusyhad/SusyRA2Analysis2015/Run2ProductionV20/ --outPath SusyRA2Analysis2015/Run2ProductionV20/ --era Summer20UL18 --model StealthSHH
Here the split signals are automatically returned to the lpcsusyhad
EOS space in the SusyRA2Analysis2015/Run2ProductionV20/
subdirectory.