Skip to content

Commit

Permalink
Adding new directory skeletons
Browse files Browse the repository at this point in the history
  • Loading branch information
vincecr0ft committed Apr 3, 2018
1 parent 3b6133c commit 7dbed00
Show file tree
Hide file tree
Showing 8 changed files with 218 additions and 2 deletions.
74 changes: 74 additions & 0 deletions docs/source/documentation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#######################
Usage and Documentation
#######################

This section discusses the specific tools for recommended for common statistical tasks. In addition, the document identifies requirements that the tools need to meet regarding documentation, testing, deployment and other current short comings that will become goals/deliverables for the (ATLAS) statistics forum.

Statistical Modeling with RooFit
================================
The first step in any statistical method is to specify the statistical model being used to describe the
data - schematically :math:`f`(data|parameters). This statistical model can be used to generate pseudo-data (toy
Monte Carlo) for a specific parameter point and with specific observed data it defines the likelihood
function :math:`L`(parameters) :math:`\equiv f` (observed data|parameters). A properly defined statistical model can be used
in conjunction with any statistical method; however, several tools (`BAT`, `MCLimit`, `Collie`, `BILL`, etc.)
provide both the modeling stage and a specific statistical method. The `RooFit` and `RooStats` projects
are designed to separate these two distinct stages; where RooFit provides the modeling language and
`RooStats` provides the statistical tests.

Statistical models implemented within the RooFit framework can be serialized (written) to `ROOT`
files – often referred to as workspaces – for subsequent use by RooStats tools or other packages such as
BAT. Given the successful use of `RooFit` for the Higgs discovery and wide use within the SUSY group,
we see it as the primary tool for statistical modeling.

In addition to the low-level classes that RooFit provides for statistical modeling, there are higher-level
tools (factories) such as `RooWorkspace::factory`, `HistFactory`, `HistFitter`, and other analysis-specific tools that produce RooFit models as an output. Analysis specific tools that produce `RooFit`
workspaces are comparable with these recommendations, though if the statistical model can be implemented with one a more general-purpose high-level tool that is encouraged. For example, `HistFactory`
is a fairly general purpose tool for histogram-based analyses that is well optimized and widely used.
`The HistFitter tool`, which uses HistFactory for modeling and provides additional functionality,
is similarly encouraged.

Improvements Needed/Planned
---------------------------

The core modeling framework of RooFit is quite mature, though a number of optimizations are still planned.

Frequentist
===========

Methodology
-----------

Upper-Limits
------------

Improvements Needed/Planned
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Background-only p-values for Searches
-------------------------------------

Measurements and Confidence Intervals / Parameter Contours
-----------------------------------------------------------

Improvements Needed/Planned
~~~~~~~~~~~~~~~~~~~~~~~~~~~


Bayesian
========

Methodology
-----------

Prior
-----

Sampling and MArginalization Tools
----------------------------------

Improvements Needed/Planned
~~~~~~~~~~~~~~~~~~~~~~~~~~~


Diagnostic Tools
================
24 changes: 24 additions & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#######################
Analysis-Ready Examples
#######################

Analysis One
============

Generate Data
-------------

Optimise Analysis
-----------------

cut based analysis
~~~~~~~~~~~~~~~~~~

Multivariate Analysis
~~~~~~~~~~~~~~~~~~~~~

Prepare Workspace
-----------------

Statistical Tests
-----------------
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This is a demonstration of the ongoing effort to document and improve understand
statisticaltests
documentation
recommendations
examples

For CERN users
--------------
Expand Down
6 changes: 6 additions & 0 deletions docs/source/readthedocs-environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
dependencies:
- python==2.7
- sphinx>=1.5.1
- pandoc
- nbconvert
- ipykernel
111 changes: 111 additions & 0 deletions docs/source/recommendations.rst

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/source/statisticaltests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -494,8 +494,8 @@ The text-book case of the construction of confidence intervals as shown in Fig r
to define a confidence belt. Whereas the text-book confidence belt of Fig ref nmconstr provided an intuitive graphical illustration of the concept of acceptance intervals on :math:`x` and confidence intervals in :math:`\mu`, a confidence belt based on a likelihood-ratio ordering rule may seem at first more obscure, but in reality isn't.
Figure ref nmconstr2 compares side-by-side the text-book confidence belt of :math:`f(x|\mu)` with a LLR-based confidence belt of :math:`\lambda(\vec{N}|\mu)`. We observe the following differences

- The variable on the horizontal axis is :math:`\lambda(\vec{N}|\mu)` instead of :math:`f(x|mu)`. As :math:`\lambda(\vec{N}|\mu)` is a scalar quantity regardless of the complexity of the observable :math:`\vec{N}` this allows us to make this confidence belt construction for any model :math:`f(\vec{N}|\mu)` of arbitrary complexity.
- The confidence belt has a different shape. Whereas the expected distribution :math:`f(x|mu)` is typically different for each value of :math:`\mu`, the expected distribution of :math:`\lambda(\vec{N}|\mu)` typically is *independent of :math:`\mu`*. The reason for this is the asymptotic distribution of :math:`\lambda(\vec{N}|\mu)` that will be discussed further in a moment. The result is though that a LLR-based confidence belt is usually a rectangular region starting at :math:`\lambda=0`.
- The variable on the horizontal axis is :math:`\lambda(\vec{N}|\mu)` instead of :math:`f(x|\mu)`. As :math:`\lambda(\vec{N}|\mu)` is a scalar quantity regardless of the complexity of the observable :math:`\vec{N}` this allows us to make this confidence belt construction for any model :math:`f(\vec{N}|\mu)` of arbitrary complexity.
- The confidence belt has a different shape. Whereas the expected distribution :math:`f(x|\mu)` is typically different for each value of :math:`\mu`, the expected distribution of :math:`\lambda(\vec{N}|\mu)` typically is *independent of* :math:`\mu`. The reason for this is the asymptotic distribution of :math:`\lambda(\vec{N}|\mu)` that will be discussed further in a moment. The result is though that a LLR-based confidence belt is usually a rectangular region starting at :math:`\lambda=0`.
- The observed quantity :math:`\lambda(\vec{N}|\mu)_{obs}` depends on :math:`\mu` unlike the observed quantity :math:`x_{obs}` in the textbook case. The reason for this is simply the form of Eq.\ref{eq:llr} that is an explicit function of :math:`\mu`. Asymptotically the dependence of :math:`\lambda(\vec{N}|\mu)` on :math:`\mu` is quadratic, as shown in the illustration.

The confidence belt construction shown in Fig ref nmconstr2, when rotated 90 degrees counterclockwise looks of course very much like an interval
Expand Down
Binary file modified workspaces/HgammagammaWorkspace.root
Binary file not shown.
Binary file modified workspaces/model.root
Binary file not shown.

0 comments on commit 7dbed00

Please sign in to comment.