class: middle, center, title-slide count: false
(for the dev team)
Matthew Feickert
[email protected]
October 18th, 2019
.grid[
.kol-1-4.center[
.circle.width-80[]
CERN
]
.kol-1-4.center[
.circle.width-80[]
Illinois
]
.kol-1-4.center[
.circle.width-80[]
UCSC SCIPP
]
.kol-1-4.center[
.circle.width-70[]
NYU ] ]
.kol-3-4.center.bold[Core Developers] .kol-1-4.center.bold[Advising]
.kol-1-2.width-90[
- High information-density summary of analysis
- Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
- Trigger
- Detector
- Systematic Uncertainties
- Event Selection
- Unique representation of the analysis to preserve
]
.kol-1-2.width-90[
]
.center[...making good on 19 year old agreement to publish likelihoods]
.center[(1st Workshop on Confidence Limits, CERN, 2000)]
.bold[This hadn't been done in HEP until now]
- In an "open world" of statistics this is a difficult problem to solve
- What to preserve and how? All of ROOT?
- Idea: Focus on a single more tractable binned model first
- A flexible p.d.f. template to build statistical models from binned distributions and data
- Developed by Cranmer, Lewis, Moneta, Shibata, and Verkerke [1]
- Widely used by the HEP community for standard model measurements and BSM searches
.kol-1-1.center[
.width-100[]
]
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ from nominal rate$\nu_{scb}^{0}$ and rate modifiers$\kappa$ and$\Delta$ - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
-
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars
.bold[This is a mathematical representation!] Nowhere is any software spec defined
Until now, the only implementation of HistFactory has been in RooStats+RooFit
- Preservation: Likelihood stored in the binary ROOT format
- Challenge for long-term preservation (i.e. HEPData)
- Why is a histogram needed for an array of numbers?
- To start using HistFactory p.d.f.s first have to learn ROOT, RooFit, RooStats
- Problem for our theory colleagues (generally don't want to)
- Difficult to use for reinterpretation
.kol-1-2.width-95[
- First non-ROOT implementation of the HistFactory p.d.f. template
- pure-Python library as second implementation of HistFactory
pip install pyhf
- No dependence on ROOT!
]
.kol-1-2.center.width-90[
]
.kol-1-1[
- Has a JSON spec that .blue[fully] describes the HistFactory model
- JSON: Industry standard, parsable by every language, human & machine readable, versionable and easily preserved (HEPData is JSON)
- Open source tool for all of HEP
JSON
defining a single channel, two bin counting experiment with systematics
.kol-1-2[
.center.width-100[]
.center[Original model]
]
.kol-1-2[
.center.width-100[]
.center[New Signal (JSON Patch file)]
]
.kol-1-1[
.center.width-80[
]
.center[Reinterpretation]
]
.center.width-80[]
.kol-1-2[
.center.width-70[
]
.center[Original analysis (model A)]
]
.kol-1-2[
.center.width-70[
]
.center[Recast analysis (model B)]
]
- Background-only model JSON stored
- Signal models stored as JSON Patch files
- Together are able to fully preserve the full model
.footnote[Updated on 2019-10-21]
- Background-only model JSON stored
- Signal models stored as JSON Patch files
- Together are able to fully preserve the full model
.footnote[Updated on 2019-10-21]
- ATLAS PUB note on the JSON schema for serialization and reproduction of results (ATL-PHYS-PUB-2019-029)
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
- Overlay of contours nice visualization of near perfect agreement
- Serialized likelihood and reproduced results of ATLAS Run-2 search for sbottom quarks (CERN-EP-2019-142) and published to HEPData
- Shown to reproduce results but faster! .bold[ROOT:] 10+ hours .bold[pyhf:] < 30 minutes
- Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
.kol-1-2.center.width-100[
]
.kol-1-2.right.width-75[
]
.center.bold[Just click the button!]
Through pyhf are able to provide:
- .bold[JSON specification] of likelihoods
- human/machine readable, versionable, HEPData friendly, orders of magnitude smaller
- .bold[Bidirectional translation] of likelihood specifications
- ROOT workspaces ↔ JSON
- Independent .bold[pure-Python implementation] of HistFactory + hypothesis testing
- Publication for the first time of the .bold[full likelihood] of a search for new physics
.kol-1-2.center.width-100[
(1st Workshop on Confidence Limits, CERN, 2000)
]
.kol-1-2.center.width-95[
(ATLAS, 2019)
]
class: end-slide, center
Backup
$ pyhf cls example.json | jq .CLs_obs
0.3599845631401913
$ cat new_signal.json
[{
"op": "replace",
"path": "/channels/0/samples/0/data",
"value": [5.0, 6.0]
}]
$ pyhf cls example.json --patch new_signal.json | jq .CLs_obs
0.4764263982925686
# One signal model
$ curl -sL https://bit.ly/33TVZ5p | \
tar -O -xzv RegionA/BkgOnly.json | \
pyhf cls --patch <(curl -sL https://bit.ly/33TVZ5p | \
tar -O -xzv RegionA/patch.sbottom_1300_205_60.json) | \
jq .CLs_obs
0.24443635754482018
# A different signal model
$ curl -sL https://bit.ly/33TVZ5p | \
tar -O -xzv RegionA/BkgOnly.json | \
pyhf cls --patch <(curl -sL https://bit.ly/33TVZ5p | \
tar -O -xzv RegionA/patch.sbottom_1300_230_100.json) | \
jq .CLs_obs
0.040766025813435774
- ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
- L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
class: end-slide, center count: false
The end.