The goal was to remove noise and irregularities from MNIST digits using Tensorflow (reproducing results originally obtained in 2006). Below are two digits called A and B.
Original (Digit A): Corrupted (Digit A):
Original (Digit B): Corrupted (Digit B):
The network (with model layers of size [768, 400, n, 400, 768] where n is size of the encoded layer) then attempts to reconstruct the original from the corrupted version:
Code Layer Size (n) | Asymptotic Error (20 epochs) | Reconstructed Digit A | Reconstructed Digit B |
---|---|---|---|
10 | 2.2e5 | ||
20 | 1.5e5 | ||
30 | 1.0e5 |
It is evident that a coding layer of size 10 is insufficient to reconstruct the original image. The '4' also resembling a '9' and the '5' a '6'.
A coding layer of size 20 accurately reconstructs the digits and removes irregularities, such as the swish on the tail of the original '4'.
A coding layer of size 30 also accurately reconstructs the digits but starts to retain irrelevant information such as the tail of the original '4'.
At least for this network model of [768, 400, n, 400, 768] the best found n was around 20, where 'best' is defined as a balance between accurately reconstructing the image without retaining irrelevant features of the original.
I have since been informed that using Sigmoid for activation is a bit outdated and that ReLU provides sufficient non-linearities and trains faster.
Pre-training weights (G. E. Hinton, R. R. Salakhutdinov - Reducing the Dimensionality of Data with Neural Networks, 2006) rather than mirroring the initial weights of the encoder and decoders may allow for smaller coding layers when training in 20 epochs.