fix: Bring back learning rate scheduler

For some reason the learning rate scheduler function definition was deleted in one of the last commits. Fixed now.
pierluigiferrari · Apr 19, 2018 · 1be6ebb · 1be6ebb
1 parent 25c3d79
commit 1be6ebb
Showing 1 changed file with 30 additions and 6 deletions.
diff --git a/ssd300_training.ipynb b/ssd300_training.ipynb
@@ -6,7 +6,10 @@
    "source": [
     "# SSD300 Training Tutorial\n",
     "\n",
-    "This tutorial explains how to train an SSD300 on the Pascal VOC datasets. Training SSD512 works simiarly, so there's no extra tutorial for that. The same goes for training on other datasets."
+    "This tutorial explains how to train an SSD300 on the Pascal VOC datasets. The preset parameters reproduce the training of the original SSD300 \"07+12\" model. Training SSD512 works simiarly, so there's no extra tutorial for that. The same goes for training on other datasets.\n",
+    "\n",
+    "You can find a summary of a full training here to get an impression of what it should look like:\n",
+    "[SSD300 \"07+12\" training summary](https://github.com/pierluigiferrari/ssd_keras/blob/master/training_summaries/ssd300_pascal_07%2B12_training_summary.md)"
    ]
   },
   {
@@ -119,7 +122,9 @@
     "2. It then loads the weights file that is found at `weights_path` into the model. You could load the trained VGG-16 weights or you could load the weights of a trained model. If you want to reproduce the original SSD training, load the pre-trained VGG-16 weights. In any case, you need to set the path to the weights file you want to load on your local machine. Download links to all the trained weights are provided in the [README](https://github.com/pierluigiferrari/ssd_keras/blob/master/README.md) of this repository.\n",
     "3. Finally, it compiles the model for the training. In order to do so, we're defining an optimizer (Adam) and a loss function (SSDLoss) to be passed to the `compile()` method.\n",
     "\n",
-    "It should be mentioned that I'm using an Adam optimizer here, while the original implementation uses plain SGD with momentum. If you want to stick with the original implementation's SGD optimizer, set the momentum to 0.9. Note that the learning rate that is being set here doesn't matter, because further below we'll pass a learning rate scheduler to the training function, which will overwrite any learning rate set here, i.e. what matters are the learning rates that are defined by the learning rate scheduler.\n",
+    "Normally, the optimizer of choice would be Adam (commented out below), but since the original implementation uses plain SGD with momentum, we'll do the same in order to reproduce the original training. Adam is generally the superior optimizer, so if your goal is not to have everything exactly as in the original training, feel free to switch to Adam. You might need to adjust the learning rate scheduler below slightly in case you use Adam.\n",
+    "\n",
+    "Note that the learning rate that is being set here doesn't matter, because further below we'll pass a learning rate scheduler to the training function, which will overwrite any learning rate set here, i.e. what matters are the learning rates that are defined by the learning rate scheduler.\n",
     "\n",
     "`SSDLoss` is a custom Keras loss function that implements the multi-task that consists of a log loss for classification and a smooth L1 loss for localization. `neg_pos_ratio` and `alpha` are set as in the paper."
    ]
@@ -162,12 +167,12 @@
     "#    If you want to follow the original Caffe implementation, use the SGD optimizer\n",
     "#    that is commented out instead of the Adam optimizer.\n",
     "\n",
-    "adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)\n",
-    "#sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)\n",
+    "#adam = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)\n",
+    "sgd = SGD(lr=0.001, momentum=0.9, decay=0.0, nesterov=False)\n",
     "\n",
     "ssd_loss = SSDLoss(neg_pos_ratio=3, alpha=1.0)\n",
     "\n",
-    "model.compile(optimizer=adam, loss=ssd_loss.compute_loss)"
+    "model.compile(optimizer=sgd, loss=ssd_loss.compute_loss)"
    ]
   },
   {
@@ -391,6 +396,25 @@
     "I'll set only a few essential Keras callbacks below, feel free to add more callbacks if you want TensorBoard summaries or whatever. We obviously need the learning rate scheduler and we want to save the best models during the training. It also makes sense to continuously stream our training history to a CSV log file after every epoch, because if we didn't do that, in case the training terminates with an exception at some point or if the kernel of this Jupyter notebook dies for some reason or anything like that happens, we would lose the entire history for the trained epochs. Finally, we'll also add a callback that makes sure that the training terminates if the loss becomes `NaN`. Depending on the optimizer you use, it can happen that the loss becomes `NaN` during the first iterations of the training. In later iterations it's less of a risk. For example, I've never seen a `NaN` loss when I trained SSD using an Adam optimizer, but I've seen a `NaN` loss a couple of times during the very first couple of hundred training steps of training a new model when I used an SGD optimizer."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# Define a learning rate schedule.\n",
+    "\n",
+    "def lr_schedule(epoch):\n",
+    "    if epoch < 80:\n",
+    "        return 0.001\n",
+    "    elif epoch < 100:\n",
+    "        return 0.0001\n",
+    "    else:\n",
+    "        return 0.00001"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -437,7 +461,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In order to reproduce the training of the \"07+12\" model mentioned above, at 1,000 training steps per epoch you'd have to train for 120 epochs. That is going to take really long though, so you might not want to do all 120 epochs in one go and instead train only for a few epochs at a time.\n",
+    "In order to reproduce the training of the \"07+12\" model mentioned above, at 1,000 training steps per epoch you'd have to train for 120 epochs. That is going to take really long though, so you might not want to do all 120 epochs in one go and instead train only for a few epochs at a time. You can find a summary of a full training [here](https://github.com/pierluigiferrari/ssd_keras/blob/master/training_summaries/ssd300_pascal_07%2B12_training_summary.md).\n",
     "\n",
     "In order to only run a partial training and resume smoothly later on, there are a few things you should note:\n",
     "1. Always load the full model if you can, rather than building a new model and loading previously saved weights into it. Optimizers like SGD or Adam keep running averages of past gradient moments internally. If you always save and load full models when resuming a training, then the state of the optimizer is maintained and the training picks up exactly where it left off. If you build a new model and load weights into it, the optimizer is being initialized from scratch, which, especially in the case of Adam, leads to small but unnecessary setbacks every time you resume the training with previously saved weights.\n",