From 6a7387632a983911eec639892bf30827b07f73fc Mon Sep 17 00:00:00 2001
From: Ava Amini <ava.amini@microsoft.com>
Date: Sun, 5 Jan 2025 14:26:55 -0500
Subject: [PATCH] finalizing pt l2p1

---
 lab2/solutions/PT_Part1_MNIST_Solution.ipynb | 180 +++++++++++--------
 1 file changed, 104 insertions(+), 76 deletions(-)
diff --git a/lab2/solutions/PT_Part1_MNIST_Solution.ipynb b/lab2/solutions/PT_Part1_MNIST_Solution.ipynb
index 78499e83..7c1ca2bb 100644
--- a/lab2/solutions/PT_Part1_MNIST_Solution.ipynb
+++ b/lab2/solutions/PT_Part1_MNIST_Solution.ipynb
@@ -10,9 +10,9 @@
         "  <td align=\"center\"><a target=\"_blank\" href=\"http://introtodeeplearning.com\">\n",
         "        <img src=\"https://i.ibb.co/Jr88sn2/mit.png\" style=\"padding-bottom:5px;\" />\n",
         "      Visit MIT Deep Learning</a></td>\n",
-        "  <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/master/lab2/solutions/Part1_MNIST_Solution.ipynb\">\n",
+        "  <td align=\"center\"><a target=\"_blank\" href=\"https://colab.research.google.com/github/aamini/introtodeeplearning/blob/master/lab2/solutions/PT_Part1_MNIST_Solution.ipynb\">\n",
         "        <img src=\"https://i.ibb.co/2P3SLwK/colab.png\"  style=\"padding-bottom:5px;\" />Run in Google Colab</a></td>\n",
-        "  <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/master/lab2/solutions/Part1_MNIST_Solution.ipynb\">\n",
+        "  <td align=\"center\"><a target=\"_blank\" href=\"https://github.com/aamini/introtodeeplearning/blob/master/lab2/solutions/PT_Part1_MNIST_Solution.ipynb\">\n",
         "        <img src=\"https://i.ibb.co/xfJbPmL/github.png\"  height=\"70px\" style=\"padding-bottom:5px;\"  />View Source on GitHub</a></td>\n",
         "</table>\n",
         "\n",
@@ -27,7 +27,7 @@
       },
       "outputs": [],
       "source": [
-        "# Copyright 2024 MIT Introduction to Deep Learning. All Rights Reserved.\n",
+        "# Copyright 2025 MIT Introduction to Deep Learning. All Rights Reserved.\n",
         "#\n",
         "# Licensed under the MIT License. You may not use this file except in compliance\n",
         "# with the License. Use and/or modification of this code outside of MIT Introduction\n",
@@ -140,7 +140,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "G1Bryi5ssUNX"
+      },
       "outputs": [],
       "source": [
         "# Download and transform the MNIST dataset\n",
@@ -156,7 +158,9 @@
     },
     {
       "cell_type": "markdown",
-      "metadata": {},
+      "metadata": {
+        "id": "D_AhlQB4sUNX"
+      },
       "source": [
         "The MNIST dataset object in PyTorch is not a simple tensor or array. It's an iterable dataset that loads samples (image-label pairs) one at a time or in batches. In a later section of this lab, we will define a handy DataLoader to process the data in batches."
       ]
@@ -164,7 +168,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "LpxeLuaysUNX"
+      },
       "outputs": [],
       "source": [
         "image, label = train_dataset[0]\n",
@@ -226,9 +232,9 @@
       },
       "source": [
         "### Fully connected neural network architecture\n",
-        "To define the architecture of this first fully connected neural network, we'll once again use the the torch.nn modules, defining the model using [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html). Note how we first use a [`nn.Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
+        "To define the architecture of this first fully connected neural network, we'll once again use the the `torch.nn` modules, defining the model using [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html). Note how we first use a [`nn.Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
         "\n",
-        "In this next block, you'll define the fully connected layers of this simple work."
+        "In this next block, you'll define the fully connected layers of this simple network."
       ]
     },
     {
@@ -252,7 +258,7 @@
         "        # '''TODO: Define the second Linear layer to output the classification probabilities'''\n",
         "        nn.Linear(128, 10),\n",
         "        nn.Softmax(dim=1) # Softmax activation for probabilities\n",
-        "        # TODO: FC layer and activation to output classification probabilities\n",
+        "        # '''TODO: Linear layer and activation to output classification probabilities'''\n",
         "        )\n",
         "    return fc_model\n",
         "\n",
@@ -276,21 +282,25 @@
       "source": [
         "Let's take a step back and think about the network we've just created. The first layer in this network, `nn.Flatten`, transforms the format of the images from a 2d-array (28 x 28 pixels), to a 1d-array of 28 * 28 = 784 pixels. You can think of this layer as unstacking rows of pixels in the image and lining them up. There are no learned parameters in this layer; it only reformats the data.\n",
         "\n",
-        "After the pixels are flattened, the network consists of a sequence of two `nn.Linear` layers. These are fully-connected ('Dense') neural layers. The first `nn.Linear` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
+        "After the pixels are flattened, the network consists of a sequence of two `nn.Linear` layers. These are fully-connected neural layers. The first `nn.Linear` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
         "\n",
         "That defines our fully connected model!"
       ]
     },
     {
       "cell_type": "markdown",
-      "metadata": {},
+      "metadata": {
+        "id": "kquVpHqPsUNX"
+      },
       "source": [
         "### Embracing subclassing in PyTorch"
       ]
     },
     {
       "cell_type": "markdown",
-      "metadata": {},
+      "metadata": {
+        "id": "RyqD3eJgsUNX"
+      },
       "source": [
         "Recall that in Lab 1, we explored creating more flexible models by subclassing [`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). This technique of defining models is more commonly used in PyTorch. We will practice using this approach of subclassing to define our models for the rest of the lab."
       ]
@@ -298,7 +308,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "7JhFJXjYsUNX"
+      },
       "outputs": [],
       "source": [
         "# Define the fully connected model\n",
@@ -314,7 +326,8 @@
         "        # '''TODO: Define the second Linear layer to output the classification probabilities'''\n",
         "        self.fc2 = nn.Linear(128, 10)\n",
         "        self.softmax = nn.Softmax(dim=1)\n",
-        "        # self.softmax = TODO\n",
+        "        # self.fc2 = # TODO\n",
+        "        # self.softmax = # TODO\n",
         "\n",
         "    def forward(self, x):\n",
         "        x = self.flatten(x)\n",
@@ -323,16 +336,16 @@
         "        # '''TODO: Implement the rest of forward pass of the model using the layers you have defined above'''\n",
         "        x = self.relu(x)\n",
         "        x = self.fc2(x)\n",
+        "        # '''TODO'''\n",
         "\n",
-        "        # In Pytorch, softmax is omitted in training since CrossEntropyLoss includes\n",
-        "        # LogSoftmax; using both would result in incorrect loss values.\n",
-        "        # Since we will train with CrossEntropyLoss, the line below can be commented out\n",
-        "\n",
-        "        # x = self.softmax(x)\n",
+        "        '''NOTE: In Pytorch, softmax is omitted in training since CrossEntropyLoss includes\n",
+        "            LogSoftmax; using both would result in incorrect loss values.\n",
+        "            Since we will train with CrossEntropyLoss, we do not need something like:\n",
+        "              x = self.softmax(x) '''\n",
         "\n",
         "        return x\n",
         "\n",
-        "fc_model = FullyConnectedModel().to(device)"
+        "fc_model = FullyConnectedModel().to(device) # send the model to GPU"
       ]
     },
     {
@@ -380,9 +393,9 @@
         "\n",
         "We're now ready to train our model, which will involve feeding the training data (`train_dataset`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training. This dataset consists of a (image, label) tuples that we will iteratively access in batches.\n",
         "\n",
-        "In Lab 1, we saw how we can use the [`.backward()`](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html) method to optimize losses and train models with stochastic gradient descent. In this section, we will define a train function to\n",
+        "In Lab 1, we saw how we can use the [`.backward()`](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html) method to optimize losses and train models with stochastic gradient descent. In this section, we will define a function to train the model using `.backward()` and `optimizer.step()` to automatically update our model parameters (weights and biases) as we saw in Lab 1.\n",
         "\n",
-        "After defining the hyperparameters, we can proceed to train the model using `.backward()` and `optimizer.step()` to automatically update our model parameters (weights and biases) as we saw in Lab 1. Recall, we mentioned in 1.1 that the MNIST dataset can be accessed iteratively in batches. Here, we will define the DataLoader that will enable us to do that."
+        "Recall, we mentioned in Section 1.1 that the MNIST dataset can be accessed iteratively in batches. Here, we will define a PyTorch [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) that will enable us to do that."
       ]
     },
     {
@@ -402,7 +415,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "dfnnoDwEsUNY"
+      },
       "outputs": [],
       "source": [
         "def train(model, dataloader, criterion, optimizer, epochs):\n",
@@ -441,13 +456,15 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "kIpdv-H0sUNY"
+      },
       "outputs": [],
       "source": [
         "# TODO: Train the model by calling the function appropriately\n",
         "EPOCHS = 5\n",
         "train(fc_model, trainset_loader, loss_function, optimizer, EPOCHS)\n",
-        "# train (TODO)\n",
+        "# train('''TODO''') # TODO\n",
         "\n",
         "comet_model_1.end()"
       ]
@@ -496,25 +513,27 @@
         "        for images, labels in testset_loader:\n",
         "            # TODO: ensure evalaution happens on the GPU\n",
         "            images, labels = images.to(device), labels.to(device)\n",
-        "            # images, labels = TODO\n",
-        "            \n",
-        "            #TODO: feed the images into the model and obtain the predictions (forward pass)\n",
+        "            # images, labels = # TODO\n",
+        "\n",
+        "            # TODO: feed the images into the model and obtain the predictions (forward pass)\n",
         "            outputs = model(images)\n",
-        "            # outputs = TODO\n",
+        "            # outputs = # TODO\n",
         "\n",
         "            loss = loss_function(outputs, labels)\n",
         "\n",
-        "            # Calculate test loss\n",
+        "            # TODO: Calculate test loss\n",
         "            test_loss += loss.item() * images.size(0)\n",
-        "            # test_loss += TODO\n",
+        "            # test_loss += # TODO\n",
         "\n",
-        "            # Calculate accuracy\n",
+        "           '''TODO: make a prediction and determine whether it is correct!'''\n",
+        "            # TODO: identify the digit with the highest probability prediction for the images in the test dataset.\n",
         "            predicted = torch.argmax(outputs, dim=1)\n",
-        "            #TODO: identify the digit with the highest confidence prediction for the first image in the test dataset.\n",
+        "            # predicted = # TODO\n",
         "\n",
+        "            # TODO: tally the number of correct predictions\n",
         "            correct_pred += (predicted == labels).sum().item()\n",
         "            # correct_pred += TODO\n",
-        "\n",
+        "            # TODO: tally the total number of predictions\n",
         "            total_pred += labels.size(0)\n",
         "            # total_pred += TODO\n",
         "\n",
@@ -523,8 +542,9 @@
         "    test_acc = correct_pred / total_pred\n",
         "    return test_loss, test_acc\n",
         "\n",
+        "# TODO: call the evaluate function to evaluate the trained model!!\n",
         "test_loss, test_acc = evaluate(fc_model, trainset_loader, loss_function)\n",
-        "# test_loss, test_acc = TODO\n",
+        "# test_loss, test_acc = # TODO\n",
         "\n",
         "print('Test accuracy:', test_acc)"
       ]
@@ -581,63 +601,58 @@
       },
       "outputs": [],
       "source": [
+        "### Basic CNN in PyTorch ###\n",
+        "\n",
         "class CNN(nn.Module):\n",
         "    def __init__(self):\n",
         "        super(CNN, self).__init__()\n",
         "        # TODO: Define the first convolutional layer\n",
         "        self.conv1 = nn.Conv2d(1, 24, kernel_size=3)\n",
-        "        # self.conv1 = TODO\n",
+        "        # self.conv1 = # TODO\n",
         "\n",
         "        # TODO: Define the first max pooling layer\n",
         "        self.pool1 = nn.MaxPool2d(kernel_size=2)\n",
-        "        # self.pool1 = TODO\n",
+        "        # self.pool1 = # TODO\n",
         "\n",
         "        # TODO: Define the second convolutional layer\n",
         "        self.conv2 = nn.Conv2d(24, 36, kernel_size=3)\n",
-        "        # self.conv2 = TODO\n",
+        "        # self.conv2 = # TODO\n",
         "\n",
         "        # TODO: Define the second max pooling layer\n",
         "        self.pool2 = nn.MaxPool2d(kernel_size=2)\n",
-        "        # self.pool2 = TODO\n",
+        "        # self.pool2 = # TODO\n",
         "\n",
         "        self.flatten = nn.Flatten()\n",
         "        self.fc1 = nn.Linear(36 * 5 * 5, 128)\n",
         "        self.relu = nn.ReLU()\n",
         "\n",
-        "        # TODO: Define the Dense layers that outputs the classification\n",
-        "        # probabilities.\n",
+        "        # TODO: Define the Linear layer that outputs the classification\n",
+        "        # logits over class labels. Remember that CrossEntropyLoss operates over logits.\n",
         "        self.fc2 = nn.Linear(128, 10)\n",
-        "\n",
-        "        # self.softmax = nn.Softmax(dim=1)\n",
-        "        # [TODO Dense layer to output classification probabilities]\n",
+        "        # self.fc2 = # TODO\n",
         "\n",
         "\n",
         "    def forward(self, x):\n",
-        "        # Convolutional and pooling layers\n",
-        "        # print(x)\n",
+        "        # First convolutional and pooling layers\n",
         "        x = self.conv1(x)\n",
-        "        # print(x)\n",
         "        x = self.relu(x)\n",
         "        x = self.pool1(x)\n",
-        "        # print(x)\n",
         "\n",
         "        # '''TODO: Implement the rest of forward pass of the model using the layers you have defined above'''\n",
+        "        #     '''hint: this will involve another set of convolutional/pooling layers and then the linear layers'''\n",
         "        x = self.conv2(x)\n",
-        "        # print(x)\n",
         "        x = self.relu(x)\n",
         "        x = self.pool2(x)\n",
-        "        # print(x)\n",
         "\n",
         "        x = self.flatten(x)\n",
-        "        # print(x)\n",
         "        x = self.fc1(x)\n",
-        "        # print(x)\n",
         "        x = self.relu(x)\n",
         "        x = self.fc2(x)\n",
-        "        # print(x)\n",
         "\n",
-        "        # Remember that we comment out softmax since we will use CrossEntropy for training\n",
-        "        # x = self.softmax(x)\n",
+        "        '''NOTE: Remember that we do not need to define/execute softmax (self.softmax(x))\n",
+        "              in the forward pass since we will use CrossEntropyLoss for training,\n",
+        "              which operates directly on logits'''\n",
+        "\n",
         "        return x\n",
         "\n",
         "# Instantiate the model\n",
@@ -660,7 +675,7 @@
         "\n",
         "Earlier in the lab, we defined a `train` function. The body of the function is quite useful because it allows us to have control over the training model, and to record differentiation operations during training by computing the gradients using `loss.backward()`. You may recall seeing this in Lab 1 Part 1.\n",
         "\n",
-        "We'll use this same framework to train our `cnn_model` using stochastic gradient descent. You are free to implement the following parts with or without the train and evaluate functions we defined above. What is most important is understanding how to manipulate the bodies of those functions to train and test models. \n",
+        "We'll use this same framework to train our `cnn_model` using stochastic gradient descent. You are free to implement the following parts with or without the train and evaluate functions we defined above. What is most important is understanding how to manipulate the bodies of those functions to train and test models.\n",
         "\n",
         "As we've done above, we can define the loss function, optimizer, and calculate the accuracy of the model. Define an optimizer and learning rate of choice. Feel free to modify as you see fit to optimize your model's performance."
       ]
@@ -680,7 +695,10 @@
         "batch_size = 64\n",
         "epochs = 7\n",
         "optimizer = optim.SGD(cnn_model.parameters(), lr=1e-2)\n",
+        "\n",
+        "# TODO: instantiate the cross entropy loss function\n",
         "loss_function = nn.CrossEntropyLoss()\n",
+        "# loss_function = # TODO\n",
         "\n",
         "# Redefine trainloader with new batch size parameter (tweak as see fit if optimizing)\n",
         "trainset_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n",
@@ -690,7 +708,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "bzgOEAXVsUNZ"
+      },
       "outputs": [],
       "source": [
         "loss_history = mdl.util.LossHistory(smoothing_factor=0.95) # to record the evolution of the loss\n",
@@ -702,7 +722,7 @@
         "\n",
         "if hasattr(tqdm, '_instances'): tqdm._instances.clear() # clear if it exists\n",
         "\n",
-        "# Training loop\n",
+        "# Training loop!\n",
         "cnn_model.train()\n",
         "\n",
         "for epoch in range(epochs):\n",
@@ -713,7 +733,7 @@
         "    # First grab a batch of training data which our data loader returns as a tensor\n",
         "    for idx, (images, labels) in enumerate(tqdm(trainset_loader)):\n",
         "        images, labels = images.to(device), labels.to(device)\n",
-        "        \n",
+        "\n",
         "        # Forward pass\n",
         "        #'''TODO: feed the images into the model and obtain the predictions'''\n",
         "        logits = cnn_model(images)\n",
@@ -721,23 +741,23 @@
         "\n",
         "        #'''TODO: compute the categorical cross entropy loss\n",
         "        loss = loss_function(logits, labels)\n",
+        "        # loss = # TODO\n",
+        "        # Get the loss and log it to comet and the loss_history record\n",
         "        loss_value = loss.item()\n",
         "        comet_model_2.log_metric(\"loss\", loss_value, step=idx)\n",
-        "        # loss_value = # TODO\n",
-        "\n",
         "        loss_history.append(loss_value) # append the loss to the loss_history record\n",
         "        plotter.plot(loss_history.get())\n",
         "\n",
         "        # Backpropagation/backward pass\n",
-        "        '''TODO: Compute gradients for all model parameters using loss.backward().\n",
-        "            and propagate backwads'''\n",
+        "        '''TODO: Compute gradients for all model parameters and propagate backwads\n",
+        "            to update model parameters. remember to reset your optimizer!'''\n",
         "        optimizer.zero_grad()\n",
         "        loss.backward()\n",
         "        optimizer.step()\n",
         "\n",
-        "        # Calculate accuracy\n",
+        "        # Get the prediction and tally metrics\n",
         "        predicted = torch.argmax(logits, dim=1)\n",
-        "        correct_pred += (predicted == labels).sum().item() \n",
+        "        correct_pred += (predicted == labels).sum().item()\n",
         "        total_pred += labels.size(0)\n",
         "\n",
         "    # Compute metrics\n",
@@ -750,9 +770,13 @@
     },
     {
       "cell_type": "markdown",
-      "metadata": {},
+      "metadata": {
+        "id": "UG3ZXwYOsUNZ"
+      },
       "source": [
-        "### Evaluate the CNN Model"
+        "### Evaluate the CNN Model\n",
+        "\n",
+        "Now that we've trained the model, let's evaluate it on the test dataset."
       ]
     },
     {
@@ -763,10 +787,10 @@
       },
       "outputs": [],
       "source": [
-        "'''TODO: Evaluate the CNN model'''\n",
+        "'''TODO: Evaluate the CNN model!'''\n",
         "\n",
         "test_loss, test_acc = evaluate(cnn_model, trainset_loader, loss_function)\n",
-        "# test_loss, test_acc = TODO\n",
+        "# test_loss, test_acc = # TODO\n",
         "\n",
         "print('Test accuracy:', test_acc)"
       ]
@@ -803,6 +827,8 @@
       "source": [
         "test_image, test_label = test_dataset[0]\n",
         "test_image = test_image.to(device).unsqueeze(0)\n",
+        "\n",
+        "# put the model in evaluation (inference) mode\n",
         "cnn_model.eval()\n",
         "predictions_test_image = cnn_model(test_image)"
       ]
@@ -833,9 +859,9 @@
         "id": "-hw1hgeSCaXN"
       },
       "source": [
-        "As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
+        "As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a  distribution over the 10 digit classes. Thus, these numbers describe the model's predicted likelihood that the image corresponds to each of the 10 different digits.\n",
         "\n",
-        "Let's look at the digit that has the highest confidence for the first image in the test dataset:"
+        "Let's look at the digit that has the highest likelihood for the first image in the test dataset:"
       ]
     },
     {
@@ -846,7 +872,7 @@
       },
       "outputs": [],
       "source": [
-        "'''TODO: identify the digit with the highest confidence prediction for the first\n",
+        "'''TODO: identify the digit with the highest likelihood prediction for the first\n",
         "    image in the test dataset. '''\n",
         "predictions_value = predictions_test_image.cpu().detach().numpy() #.cpu() to copy tensor to memory first\n",
         "prediction = np.argmax(predictions_value)\n",
@@ -890,7 +916,9 @@
     {
       "cell_type": "code",
       "execution_count": null,
-      "metadata": {},
+      "metadata": {
+        "id": "v6OqZSiAsUNf"
+      },
       "outputs": [],
       "source": [
         "# Initialize variables to store all data\n",
@@ -903,13 +931,13 @@
         "    for images, labels in testset_loader:\n",
         "        outputs = cnn_model(images)\n",
         "\n",
-        "         # Apply softmax to get probabilities\n",
+        "        # Apply softmax to get probabilities from the predicted logits\n",
         "        probabilities = torch.nn.functional.softmax(outputs, dim=1)\n",
         "\n",
         "        # Get predicted classes\n",
         "        predicted = torch.argmax(outputs, dim=1)\n",
         "\n",
-        "        all_predictions.append(probabilities) \n",
+        "        all_predictions.append(probabilities)\n",
         "        all_labels.append(labels)\n",
         "        all_images.append(images)\n",
         "\n",
@@ -1012,4 +1040,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}
\ No newline at end of file