-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes
33 lines (23 loc) · 834 Bytes
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
structure:
input layer:
30 neurons holding each normalized feature value
hidden layers:
two layers of x perceptron
output:
2 neuron corresponding to P(M) and p(B)
With
l: the layer
a: the activation matrix of the layer
b: the bias matrix of the layer
w: the weights matrix between the layer and the previous layer
a(l) = sigmoid(w(l) * a(l - 1) + b(l))
cost function of a sample input:
(output - expected) ^ 2
must be done on all samples and averaged to get the cost of the network.
once done, get the negative gradient of the cost function (of all the samples) to descent and minimize it.
Backpropagation:
Used to compute the negative gradient of the cost function I.E. the partial derivative of the cost function C by Weights and Biases matrices
dataset division:
60% for training
20% for validation
20% for test