The goals / steps of this project were the following:
- Load the dataset (see below for links to the project dataset)
- Explore, summarize and visualize the dataset
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
Here I will consider the rubric points individually and describe how I addressed each point in my implementation.
The submission includes a writeup, which you're reading right now!
And here is a link to my project code.
Summary statistics of the traffic signs dataset:
- The size of training set is 34799
- The size of the validation set is 4410
- The size of test set is 12630
- The shape of a traffic sign image is (32, 32, 3)
- The number of unique classes/labels in the dataset is 43
Here is a bar chart showing how the traffic sign classes are distributed in the training dataset. You can see that some classes (e.g. 1 and 2) are much more common than others (e.g. 0 and 19).
For visualizing the dataset, I also printed out one image of each traffic sign class in the jupyter notebook.
For preprocessing the data I used the following steps:
- shuffle the training data, to get a random order of the images
- normalization of the image data: [0 .. 255] --> [0.1 .. 0.9]
I did NOT convert the images to grayscale to not lose the color information. I found that with my setup the prediction accuracy of the validation dataset dropped by 0.01, when converting the images to grayscale.
I used the LeNet architecture as a starting point, which works very well on 32x32 images. To adapt it to the colored images and the larger number of output classes, I doubled the size of every network layer.
My final model consisted of the following layers:
Layer | Description |
---|---|
Input | 32x32x3 RGB image |
Convolution 5x5 | 2x2 stride, VALID padding, outputs 28x28x12 |
RELU | simple activation function |
Max pooling | 2x2 stride, outputs 14x14x12 |
Convolution 5x5 | 2x2 stride, VALID padding, outputs 10x10x32 |
RELU | simple activation function |
Max pooling | 2x2 stride, outputs 5x5x32 |
Flatten | change output shape from 5x5x32 to 800 |
Fully connected with dropout | output: 240 |
RELU | simple activation function |
Fully connected with dropout | output: 168 |
RELU | simple activation function |
Output layer | output: 43 |
For training the model, I used the following parameters:
- optimizer: AdamOptimizer (tensorflow)
- batch size: 128
- number of epochs: 20
- learning rate: 0.0005
- dropout rate: 0.5
My final model results were:
- validation set accuracy of 0.960
- test set accuracy of 0.944
I started off with the standard LeNet architecture from class, because this architecture is able to classify 32p by 32p images quite good by default (as discussed in class). With standard LeNet, I reached a validation set accuracy of about 0.89.
To improve the accuracy, I added dropout to the fully connected layers of the network. I found a dropout rate of 0.5 to be the optimum for this architecture. I also doubled the size of each layer in the network to match the fact that the number of output classes in the German traffic sign dataset (n_classes = 43) is much higher than in the MNIST dataset (n_classes = 10). And also to match the fact that there is more information in the colored traffic sign images than in the grayscale images of the MNIST dataset.
The validation set accuracy of 0.96 - which is 0.03 points higher than the minimum expectation - shows that the model works well.
Here are not five but ten German traffic signs that I found on the web:
The first image might be difficult to classify because is shows the traffic sign a litte bit from the side, so the image is a bit distorted. The other nine images should be easyer to classify.
Here are the results of the prediction:
Image (class and name) | Prediction (class and name) |
---|---|
(1) Speed limit (30km/h) | (1) Speed limit (30km/h) |
(1) Speed limit (30km/h) | (1) Speed limit (30km/h) |
(1) Speed limit (30km/h) | (1) Speed limit (30km/h) |
(2) Speed limit (50km/h) | (2) Speed limit (50km/h) |
(11) Right-of-way at the next intersection | (11) Right-of-way at the next intersection |
(12) Priority road | (12) Priority road |
(40) Roundabout mandatory | (40) Roundabout mandatory |
(9) No passing | (9) No passing |
(9) No passing | (9) No passing |
(2) Speed limit (50km/h) | (2) Speed limit (50km/h) |
The model was able to correctly guess 10 of the 10 traffic signs, which gives an accuracy of 100%. This compares favorably to the accuracy on the test set of 94%.
The code for making predictions on my final model is located in the 15th and 16th cell of the jupyter notebook.
For the first image, the model is very sure that this is a 30km/h speed limit (probability of 94.3%), and this prediction is correct. The top five soft max probabilities were
Probability [%] | Prediction (class and name) |
---|---|
94.3 | (1) Speed limit (30km/h) |
3.2 | (2) Speed limit (50km/h) |
2.3 | (25) Road work |
0.1 | (5) Speed limit (80km/h) |
0.1 | (31) Bicycles crossing |
With speed limit (30km/h), speed limit (50km/h) and speed limit (80km/h) there are three very similar images in the top five. Even for a human it can be difficult to distinguish those traffic signs from the distance.
For the second and all the following images the model is 100% sure about its prediction, as you can see in the report. And they are all correct.