Skip to content

Latest commit

 

History

History
462 lines (386 loc) · 14.9 KB

ann_talk.org

File metadata and controls

462 lines (386 loc) · 14.9 KB

Neural networks and their application at CAST

CERN Axion Solar Telescope

CERN Axion Solar Telescope

CAST

~/org/Talks/Jamboree_Feb2018/figs/CAST-Panorama.JPG

the experiment

  • search for solar axions, hypothetical pseudoscalar particle solving the strong $\mathcal{CP}$ problem
  • potential dark matter candidate
  • coupling to transverse $B$ fields, production in the Sun!

CERN Axion Solar Telescope

CAST

~/org/Talks/Jamboree_Feb2018/figs/CAST-Panorama.JPG

to take away…

  • exp. signal rates: $≤ \num{0.1}$ $γ$ \si{\per\hour}
  • background rate: $∼ \SI{0.1}{\per \s}$
  • need very good background suppression

Neural network intro

Artificial Neural Networks (ANNs)

ANN primer

  • type of multivariate analysis object providing highly non-linear, multidimensional representations of input data
  • simplest type: feed-forward multilayer perceptron

MLP example

~/Documents/Talks/figs/skizze_ann_clean.png

Artificial Neural Networks (ANNs)

Producing an output and training

Neuron output: \[ y_k = \varphi ∑_{j = 0}^m w_{kj} x_j \] \( \varphi \): activation function, \( w_k \) weight vector

Training minimizes error function \[ E(\mathbf{x_1}, …, \mathbf{x_N} | \mathbf{w}) = ∑_{a=1}^N \frac{1}{2}\left(y_{\text{ANN},a} - \hat{y}_a\right)^2 \] using gradient descent \[ \mathbf{w}^{n+1} = \mathbf{w}^n - η ∇_{w} E \]

Convolutional Neural Networks

CNN schmatic

convolutional and pooling layers alternating:

~/Documents/Talks/figs/mylenet.png

where a convolutional layer is:

~/Documents/Talks/figs/conv_1D_nn.png

Convolution example in python

Python calc of 2D convolution (instead of a gif…)

import numpy as np
from scipy.signal import convolve2d
A = np.identity(6)
B = np.array([[0,0,0],[0,5,0],[0,0,0]])
C = convolve2d(A, B, 'same')
print(C)
[[5. 0. 0. 0. 0. 0.]
 [0. 5. 0. 0. 0. 0.]
 [0. 0. 5. 0. 0. 0.]
 [0. 0. 0. 5. 0. 0.]
 [0. 0. 0. 0. 5. 0.]
 [0. 0. 0. 0. 0. 5.]]

Convolution example in python

Python calc of 2D convolution (instead of a gif…)

import numpy as np
from scipy.signal import convolve2d
A = np.identity(6)
B = np.array([[1,0,1],[0,1,0],[1,0,1]])
C = convolve2d(A, B, 'same')
print(C)
[[2. 0. 1. 0. 0. 0.]
 [0. 3. 0. 1. 0. 0.]
 [1. 0. 3. 0. 1. 0.]
 [0. 1. 0. 3. 0. 1.]
 [0. 0. 1. 0. 3. 0.]
 [0. 0. 0. 1. 0. 2.]]

Convolution example in pictures

A pictures is worth a thousand words?

~/CastData/ExternCode/NeuralNetworkLiveDemo/media/songhoa/combined.png

\tiny source: http://www.songho.ca/dsp/convolution/convolution2d_example.html

Live demo of MLP training on MNIST

Live demo of MLP training on MNIST

Simple demo of training simple ANN on MNIST

  • MNIST: a dataset of \num{70000} handwritten digits, size normalized to $\num{28}×\num{28}$ pixels, centered
    • in the past used to benchmark image classification; nowadays fast to achieve good accuracies $\geq\SI{90}{\percent}$
  • network layout:
    • input neurons: $\num{28}×\num{28}$ neurons (note: as \num{1}D!)
    • 1 hidden layer: \num{1000} neurons
    • output layer: \num{10} neurons (\num{1} for each digit)
    • activation function: rectified liner unit (ReLU):

\[ f(x) = max(0, x) \]

Live demo of MLP training on MNIST

What do I mean by live demo? 2 programs

  • Program 1: trains multilayer perceptron (MLP)
    • written in Nim (C backend), using Arraymancer
      • linear algebra + neural network library
    • trains on \num{60000} digits, performs validation on \num{10000} digits
  • after every 10 batches (1 batch: 64 digits) send to program 2:
    • random test digit
    • predicted output
    • current error
  • Program 2 plots data live: written in Nim (JS backend), plots using plotly.js

Neural networks at CAST

Back to CAST

Requirements for detectors at CAST

  • CAST is a very low rate experiment!
  • detectors should reach: $f_{\text{Background}} ≤ \SI{e-6}{\per \keV \per \cm \squared \per \s}$
  • signal / background ratio: $\frac{f_{\text{Background}}}{f_{\text{Signal}}} > \num{e5}$
    • need very good signal / background classification!

Background example

X-ray example

Back to CAST

Requirements for detectors at CAST

  • \textcolor{gray}{CAST is a very low rate experiment!}
  • \textcolor{gray}{detectors should reach:} $f_{\text{Background}} ≤ \SI{e-6}{\per \keV \per \cm \squared \per \s}$
  • \textcolor{gray}{signal / background ratio:} $\frac{f_{\text{Background}}}{f_{\text{Signal}}} > \num{e5}$
    • \textcolor{gray}{need very good signal / background classification!}
  • events (as on previous slides) can be interpreted as images
  • Convolutional Neural Networks extremely good at image classification

$⇒$ use Convolutional Neural Networks?

Old analysis - data and likelihood method

  • visible from comparison of background to X-ray event that geometric shapes are very different
  • utilize that to remove as much background as possible

Likelihood analysis

  • energy range: \SIrange{0}{10}{\kilo \electronvolt}
  • split into 8 unequal bins of distinct event properties

Baseline analysis

Analysis pipeline as follows

$⇒$ raw events
$\hphantom{⇒}$ filter ‘clusters’ \ $\hphantom{⇒}$ calc (geometric) properties\ $\hphantom{⇒}$ calc likelihood distribution from:\ \setlength{\leftmargini}{10pt} $\hphantom{⇒}$ \beamerbullet- eccentricity\ $\hphantom{⇒}$ \beamerbullet- length / transverse RMS\ $\hphantom{⇒}$ \beamerbullet- fraction within transverse RMS

Eccentricity

Length / $\text{RMS}_{\text{trans}}$

# pix in $\text{RMS}_{\text{trans}}$

Current analysis - data and likelihood method

Likelihood analysis & CNN analysis

  • energy range: \SIrange{0}{10}{\kilo \electronvolt}
  • split into 8 unequal bins of distinct event properties
  • only based on properties of X-rays
  • set cut on Likelihood distribution, s.t. \SI{80}{\percent} of X-rays are recovered
  • now: use artificial neural network to classify events as X-ray or background

ANNs applied to CAST

Two ANN approaches

  1. calculate properties of event, use properties as input neurons
  2. use whole events (\(\num{256} × \num{256}\) pixels) as input layer
  • reg. 1:
    • small layout \( ⇒ \) fast to train
    • potentially biased, not all information usable
  • reg. 2:
    • huge layout \( ⇒ \) only trainable on GPU
    • all information available

CNN implementation details

8 networks in total, one for each $E$ bin

  • input size: $\num{256}×\num{256}$ neurons
  • 3 convolutional and pooling layers alternating w/ 30, 70, 100 kernels using $\num{15} × \num{15}$ filters
  • pooling layers perform $\num{2}×\num{2}$ max pooling
  • $tanh$ activation function
  • 1 fully connected feed-forward layer: (1800, 30) neurons
  • logistic regression layer: \num{2} output neurons
  • training w/ \num{12000} events per type on Nvidia GTX 1080
  • training time: $∼ \SIrange[range-phrase={\text{to}}]{1}{10}{\hour}$

CNN example output distribution

CNN output distribution: bad

CNN example output distribution

CNN output distribution: good

Potential improvements via CNNs

Signal eff. vs background rej.

Potential improvements via CNNs

baseline vs. CNNs: $5×$ background reduction (2014/15 data)

“Summary”