From 6fa521a95c080dd149ec50bde2191055eb089d97 Mon Sep 17 00:00:00 2001 From: Weiqiang Zhu Date: Sat, 23 Nov 2024 21:51:05 -0800 Subject: [PATCH] add exercise --- docs/exercises/09_neural_networks1.ipynb | 741 +++++++++++++++++++++++ 1 file changed, 741 insertions(+) create mode 100644 docs/exercises/09_neural_networks1.ipynb diff --git a/docs/exercises/09_neural_networks1.ipynb b/docs/exercises/09_neural_networks1.ipynb new file mode 100644 index 0000000..b63e4ec --- /dev/null +++ b/docs/exercises/09_neural_networks1.ipynb @@ -0,0 +1,741 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "V920LTuiq40d" + }, + "source": [ + "## Regression and Classification with Neural Networks\n", + "\n", + "\n", + "Run in Google Colab " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Created by Minh-Chien Trinh, Jeonbuk National University, \n", + "\n", + "## 1.1. A Brief History\n", + "\n", + "In the 1940s, NNs were conceived.\n", + "\n", + "In the 1960s, the concept of backpropagation came, then people know how to train them.\n", + "\n", + "In 2010, NNs started winning competitions and get much attention than before.\n", + "\n", + "Since 2010, NNs have been on a meteoric rise as their magical ability to solve problems previously deemed unsolvable (i.e., image captioning, language translation, audio and video synthesis, and more).\n", + "\n", + "One important milestone is the AlexNet architecture in 2012, which won the ImageNet competition. \n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "The ImageNet competition is a benchmark for image classification, where the goal is to classify images into one of 1,000 categories.\n", + "\n", + "\n", + "\n", + "\n", + "You can find more information about the AlexNet model on [Wikipedia](https://en.wikipedia.org/wiki/AlexNet). We will use the AlexNet model in the next lecture to classify images of rocks.\n", + "\n", + "Currently, NNs are the primary solution to most competitions and technological challenges like self-driving cars, calculating risk, detecting fraud, early cancer detection,…\n", + "\n", + "## 1.2. What is a Neural Network?\n", + "\n", + "ANNs are inspired by the organic brain, translated to the computer.\n", + "\n", + "ANNs have neurons, activations, and interconnectivities.\n", + "\n", + "NNs are considered “black boxes” between inputs and outputs.\n", + "\n", + "
\n", + "\n", + "Each connection between neurons has a weight associated with it. Weights are multiplied by corresponding input values. These multiplications flow into the neuron and are summed before being added with a bias. Weights and biases are trainable or tunable.\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "output & = weight \\cdot input + bias \\\\\n", + "y & = a \\cdot x + b\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "The formula should look very familiar to you. It is similar to the previous linear regression and classification models.\n", + "\n", + "Then, an activation function is applied to the output.\n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "output & = \\sum (weight \\cdot input) + bias \\\\\n", + "output & = activation (output)\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "When a step function that mimics a neuron in the brain (i.e., “firing” or not, on-off switch) is used as an activation function:\n", + "- If its output is greater than 0, the neuron fires (it would output 1).\n", + "- If its output is less than 0, the neuron does not fire and would pass along a 0.\n", + "\n", + "The input layer represents the actual input data (i.e., pixel values from an image, temperature, …)\n", + "\n", + "- The data can be “raw”, should be preprocessed like normalization and scaling. \n", + "- The input needs to be in numeric form.\n", + "\n", + "The output layer is whatever the NN returns.\n", + "- In regression, the predicted value is a scalar value, the output layer has a single neuron.\n", + "- In classification, the class of the input is predicted, the output layer has as many neurons as the training dataset has classes. But can also have a single output neuron for binary (two classes) classification.\n", + "\n", + "A typical NN has thousands or even up to millions of adjustable parameters (weights and biases).\n", + "\n", + "NNs act as enormous functions with vast numbers of parameters.\n", + "\n", + "Finding the combination of parameter (weight and bias) values is the challenging part.\n", + "\n", + "The end goal for NNs is to adjust their weights and biases (the parameters), so they produce the desired output for unseen data.\n", + "\n", + "A major issue in supervised learning is overfitting, where the algorithm doesn’t understand underlying input-output dependencies, just basically “memorizes” the training data.\n", + "\n", + "The goal of NN is generalization, that can be obtained when separating the data into training data and validation data.\n", + "\n", + "Weights and biases are adjusted based on the error/loss presenting how “wrong” the algorithm in NN predicting the output.\n", + "\n", + "NNs can be used for regression (predict a scalar, singular, value), clustering (assigned unstructured data into groups), and many other tasks.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this lecture, we will use PyTorch to build and train neural networks. \n", + "The pytorch library is a powerful tool for building and training neural networks. It provides a flexible and efficient library for deep learning. It is also currently the most popular library for deep learning.\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "Let's import the necessary libraries of PyTorch and other libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WZ3Y_KHvq40x" + }, + "outputs": [], + "source": [ + "## First part of this semester\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "## Second part of this semester\n", + "from sklearn.impute import SimpleImputer\n", + "from sklearn.metrics import r2_score, accuracy_score, confusion_matrix, ConfusionMatrixDisplay\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.preprocessing import LabelEncoder\n", + "\n", + "## Last part of this semester\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "D9d6X0ZZq40z" + }, + "outputs": [], + "source": [ + "## Set random seed for reproducibility\n", + "torch.manual_seed(0)\n", + "torch.cuda.manual_seed(0)\n", + "np.random.seed(0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dAnY8yaDq400" + }, + "source": [ + "## Applying Neural Networks for Regression\n", + "\n", + "In today's lecture, we will revisit the Betoule data and apply neural networks for regression.\n", + "\n", + "If you already forgot the background of this data, please review the lecture [04 regression](https://ai4eps.github.io/EPS88_PyEarth/lectures/04_regression/#going-even-further-out-into-the-universe).\n", + "\n", + "Remember the challenge of the Betoule data is that the velocity is non-linear with respect to the distance.\n", + "\n", + "In the previous lecture, we used sklearn to fit the linear regression model with high polynomial degrees.\n", + "\n", + "Here we will use PyTorch to fit the Betoule data and compare the results with the linear regression model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Load the Betoule data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Load the Betoule data\n", + "# betoule_data = pd.read_csv('data/mu_z.csv',header=1) ## reading from local file\n", + "betoule_data = pd.read_csv('https://raw.githubusercontent.com/AI4EPS/EPS88_PyEarth/refs/heads/main/docs/scripts/data/mu_z.csv',header=1) ## reading from github for running on colab\n", + "betoule_data.head()\n", + "\n", + "## Apply processing to convert to distance and velocity\n", + "# speed of light in km/s\n", + "c = 2.9979e8 / 1000 \n", + "\n", + "## the formula for v from z (and c)\n", + "betoule_data['velocity'] = c * (((betoule_data['z']+1.)**2-1.)/((betoule_data['z']+1.)**2+1.)) \n", + "\n", + "## convert mu to Gpc\n", + "betoule_data['distance'] = 10000*(10.**((betoule_data['mu'])/5.))*1e-9" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Review the data\n", + "plt.figure()\n", + "plt.scatter(\n", + "plt.xlabel('Distance (Mpc)')\n", + "plt.ylabel('Velocity (km s$^{-1}$)')\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Prepare the data into features (X) and target (y). This is same as the previous lecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Define features (X) and target (y) variables using the distance as the feature and velocity as the target\n", + "X = \n", + "y = \n", + "\n", + "## Split the data into training and test sets using 30% of the data for testing\n", + "X_train, X_test, y_train, y_test = " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Let's start to build the first neural network model to fit the Betoule data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Convert data to PyTorch tensors\n", + "X_train_tensor = torch.tensor(X_train, dtype=torch.float32)\n", + "y_train_tensor = torch.tensor(y_train, dtype=torch.float32)\n", + "X_test_tensor = torch.tensor(X_test, dtype=torch.float32)\n", + "y_test_tensor = torch.tensor(y_test, dtype=torch.float32)\n", + "\n", + "## Normalize the data to make the training process more efficient\n", + "magnitude_X = 10**int(np.log10(X.max()))\n", + "magnitude_y = 10**int(np.log10(y.max()))\n", + "X_train_tensor = X_train_tensor / magnitude_X\n", + "y_train_tensor = y_train_tensor / magnitude_y\n", + "X_test_tensor = X_test_tensor / magnitude_X\n", + "y_test_tensor = y_test_tensor / magnitude_y\n", + "\n", + "## Define the neural network model\n", + "class SimpleNN(nn.Module):\n", + " def __init__(self, input_size, output_size, hidden_size):\n", + " super(SimpleNN, self).__init__()\n", + " ## Define a linear layer with input size, hidden size\n", + " self.fc1 = \n", + " ## Define an activation function using ReLU\n", + " self.relu = \n", + " ## Define a linear layer with hidden size, output size\n", + " self.fc2 =\n", + " \n", + " def forward(self, x):\n", + " ## Apply the first linear layer\n", + " out = \n", + " ## Apply the activation function\n", + " out = \n", + " ## Apply the second linear layer\n", + " out = \n", + " return out\n", + "\n", + "## Initialize the model, loss function, and optimizer\n", + "input_size = X.shape[-1]\n", + "output_size = 1 # Output layer for regression (1 output neuron)\n", + "hidden_size = 16\n", + "\n", + "## Define the model, loss function, and optimizer. Hint: using your defined model, MSE loss, and Adam optimizer\n", + "model = \n", + "criterion = \n", + "optimizer = \n", + "\n", + "## Define fit function\n", + "def fit(model, X, y, epochs=100):\n", + " ## set the model to training\n", + " model.\n", + " losses = []\n", + " for epoch in range(epochs):\n", + " ## zero the gradients\n", + " optimizer.zero_grad()\n", + "\n", + " ## get the outputs from the model\n", + " outputs = \n", + " ## calculate the loss\n", + " loss = \n", + " loss.backward()\n", + " ## update the weights\n", + " optimizer.\n", + "\n", + " losses.append(loss.item())\n", + " if (epoch+1) % 10 == 0:\n", + " print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')\n", + " return losses\n", + "\n", + "## Define predict function\n", + "def predict(model, X):\n", + " ## set the model to evaluation\n", + " model.\n", + " with torch.no_grad():\n", + " ## get the outputs from the model\n", + " outputs = \n", + " return outputs\n", + "\n", + "## Train the model\n", + "losses = fit(model, X_train_tensor, y_train_tensor, epochs=100)\n", + "\n", + "## Plot the loss during the training process\n", + "plt.figure()\n", + "plt.plot(\n", + "plt.xlabel('Epochs')\n", + "plt.ylabel('Loss')\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Evaluate the model on the test set. This is same as the previous lecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Predict on the test set\n", + "y_pred_tensor = predict(\n", + "y_pred = y_pred_tensor.numpy() * magnitude_y\n", + "\n", + "## Calculate R-squared metric\n", + "r2 = \n", + "print(f'R-squared: {r2:.4f}')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "35aMo_r2q41G", + "outputId": "657fa561-a867-49b1-ed51-eea21d183a84", + "scrolled": true + }, + "outputs": [], + "source": [ + "## Predict on the whole dataset for plotting\n", + "X_tensor = torch.tensor(X, dtype=torch.float32)\n", + "X_tensor = X_tensor / magnitude_X\n", + "y_pred_tensor = predict(model, X_tensor)\n", + "y_pred = y_pred_tensor.numpy() * magnitude_y\n", + "y_pred = y_pred.squeeze() # remove the extra dimension\n", + "\n", + "## Plot the results\n", + "plt.figure(figsize=(6, 6))\n", + "plt.subplot(2,1,1)\n", + "## plot the data\n", + "plt.scatter(\n", + "## plot the fitted line\n", + "plt.plot(\n", + "plt.title('data and a polynomial degree 2 fit')\n", + "plt.ylabel('Velocity (km s$^{-1}$)')\n", + "plt.xlabel('Distance (Mpc)')\n", + "\n", + "## plot the residuals\n", + "plt.subplot(2,1,2)\n", + "plt.scatter(\n", + "plt.title('residuals of a polynomial degree 2 fit')\n", + "plt.ylabel('Residual velocity (km s$^{-1}$)')\n", + "plt.xlabel('Distance (Mpc)')\n", + "\n", + "plt.tight_layout()\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Compare the results with previous polynomial regression. How does the neural network perform?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jpsnS1HDq403" + }, + "source": [ + "## Applying Neural Networks for Classification\n", + "\n", + "Neural networks work well for the regression tasks, how about the classification tasks?\n", + "\n", + "Let's continue to apply neural networks for the binary classification task.\n", + "\n", + "Again, we will re-use the basalt affinity dataset that we covered in the previous lecture.\n", + "\n", + "If you already forgot the background of this data, please review the lecture [05 classification](https://ai4eps.github.io/EPS88_PyEarth/lectures/05_classification/#classifying-volcanic-rocks)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Load the basalt affinity data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CzJ18ICiq404" + }, + "outputs": [], + "source": [ + "## Load the basalt affinity data\n", + "# basalt_data = pd.read_csv('data/Vermeesch2006.csv') ## reading from local file\n", + "basalt_data = pd.read_csv('https://raw.githubusercontent.com/AI4EPS/EPS88_PyEarth/refs/heads/main/docs/scripts/data/Vermeesch2006.csv') ## reading from github for running on colab\n", + "basalt_data.tail()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Review the data\n", + "plt.figure(figsize=(8, 6))\n", + "\n", + "## plot each affinity as a different color\n", + "for affinity in basalt_data['affinity'].unique():\n", + " subset = \n", + " plt.scatter(\n", + "\n", + "plt.legend()\n", + "plt.xlabel('TiO2 (wt%)')\n", + "plt.ylabel('V (ppm)')\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Prepare the data into features (X) and target (y). This is same as the previous lecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "2DdtOvQuq406", + "outputId": "280ae85a-fd8f-4e82-e877-c7abef6415ee", + "scrolled": true + }, + "outputs": [], + "source": [ + "## Prepare the data into features (X) and target (y)\n", + "X = \n", + "y = \n", + "\n", + "## Encode the target variable\n", + "le = LabelEncoder()\n", + "y = \n", + "\n", + "## Impute missing values using median imputation\n", + "imputer = SimpleImputer(strategy='median')\n", + "X = \n", + "\n", + "## Split the data into training and test sets using 30% of the data for testing\n", + "X_train, X_test, y_train, y_test = " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Let's start to build the second neural network model to fit the basalt affinity data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Convert data to PyTorch tensors\n", + "X_train_tensor = torch.tensor(X_train, dtype=torch.float32)\n", + "y_train_tensor = torch.tensor(y_train, dtype=torch.long)\n", + "X_test_tensor = torch.tensor(X_test, dtype=torch.float32)\n", + "y_test_tensor = torch.tensor(y_test, dtype=torch.long)\n", + "\n", + "## Normalize the data to make the training process more efficient\n", + "mu = X_train_tensor.mean(dim=0, keepdim=True)\n", + "std = X_train_tensor.std(dim=0, keepdim=True)\n", + "X_train_tensor = (X_train_tensor - mu) / std\n", + "X_test_tensor = (X_test_tensor - mu) / std\n", + "\n", + "## Define the neural network model\n", + "class SimpleNN(nn.Module):\n", + " def __init__(self, input_size, output_size, hidden_size):\n", + " super(SimpleNN, self).__init__()\n", + " ## Define a linear layer with input size, hidden size\n", + " self.fc1 = \n", + " ## Define an activation function using ReLU\n", + " self.relu = \n", + " ## Define a linear layer with hidden size, output size\n", + " self.fc2 = \n", + " \n", + " def forward(self, x):\n", + " ## Apply the first linear layer\n", + " out = \n", + " ## Apply the activation function\n", + " out = \n", + " ## Apply the second linear layer\n", + " out = \n", + " return out\n", + "\n", + "## Initialize the model, loss function, and optimizer\n", + "input_size = X_train.shape[-1]\n", + "output_size = len(le.classes_) # Output layer for classification (number of classes)\n", + "hidden_size = 16\n", + "\n", + "## Define the model, loss function, and optimizer. Hint: using your defined model, CrossEntropy loss, and Adam optimizer\n", + "model = \n", + "criterion = \n", + "optimizer = \n", + "\n", + "## Define fit function\n", + "def fit(model, X_train, y_train, epochs=100):\n", + " ## set the model to training\n", + " model.\n", + " losses = []\n", + " for epoch in range(epochs):\n", + " ## zero the gradients\n", + " optimizer.zero_grad()\n", + "\n", + " ## get the outputs from the model\n", + " outputs = \n", + " ## calculate the loss\n", + " loss = \n", + " loss.backward()\n", + " ## update the weights\n", + " optimizer.step()\n", + "\n", + " losses.append(loss.item())\n", + " if (epoch+1) % 10 == 0:\n", + " print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')\n", + " return losses\n", + "\n", + "## Define predict function\n", + "def predict(model, X):\n", + " ## set the model to evaluation\n", + " model.\n", + " with torch.no_grad():\n", + " ## get the outputs from the model\n", + " outputs = \n", + " _, predicted = torch.max(outputs, 1)\n", + " return predicted\n", + "\n", + "## Train the model\n", + "losses = fit(model, X_train_tensor, y_train_tensor, epochs=100)\n", + "\n", + "## Plot the loss\n", + "plt.figure()\n", + "plt.plot(\n", + "plt.xlabel('Epochs')\n", + "plt.ylabel('Loss')\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rohwh7Ugq41C" + }, + "source": [ + "- Evaluate the model on the test set. This is same as the previous lecture." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## Predict on the test set\n", + "y_pred_tensor = \n", + "y_pred = y_pred_tensor.numpy()\n", + "\n", + "## Calculate accuracy\n", + "accuracy = \n", + "print(f'Accuracy: {accuracy:.4f}')\n", + "\n", + "## Confusion matrix; Hint: use confusion_matrix from sklearn.metrics\n", + "conf_matrix = \n", + "disp = ConfusionMatrixDisplay(confusion_matrix=conf_matrix, display_labels=le.classes_)\n", + "disp.plot(cmap=plt.cm.Blues, values_format='d', colorbar=False);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Compare the results with previous classification methods. How does the neural network perform?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Compare the two neural networks built for the regression and classification tasks. Please list the similarities and differences." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- The neural networks we built are very simple with only one hidden layer. Do you know which variable controls the complexity of the neural networks?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- If we want to build a more complex neural network, how can we do it? Think about the number of layers and neurons in each layer.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h2Z7LBzaq41P" + }, + "source": [ + "If you are interested to build a more complex neural network, you can try the following website.\n", + "\n", + "The more layers and neurons you add, the more complex the neural network becomes, it can fit more complex data, while in the meantime, it is also more challenging to train.\n", + "\n", + "There are many hyperparameters you can tune in the online playgroud. Explore if we can find the parameters that can fit all the data distributions.\n", + "\n", + "[Train a neural network online](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.43783&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)\n", + "\n", + "![20241103200049](https://raw.githubusercontent.com/zhuwq0/images/main/20241103200049.png)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + }, + "vscode": { + "interpreter": { + "hash": "bd97b8bffa4d3737e84826bc3d37be3046061822757ce35137ab82ad4c5a2016" + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}