From 9ee5a521f15a49e26647de454bc62893d717bdb4 Mon Sep 17 00:00:00 2001 From: 296406598 <296406598@qq.com> Date: Fri, 13 Oct 2023 05:45:28 +0800 Subject: [PATCH 1/7] add gradient-descent.ipynb --- .../ml-fundamentals/gradient-descent.ipynb | 395 ++++++++++++++++++ 1 file changed, 395 insertions(+) create mode 100644 open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb new file mode 100644 index 0000000000..3025cbad27 --- /dev/null +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -0,0 +1,395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Gradient descent" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Session Objective\n", + "\n", + "In previous sessions, we've delved into the application of Linear Regression and Logistic Regression models. You may find the code relatively straightforward and intuitive at this point. However, you might be pondering questions like:\n", + "\n", + "- What exactly occurs when we invoke the `.fit()` function?\n", + "- Why does the execution of the `.fit()` function sometimes take a significant amount of time?\n", + "\n", + "This session is designed to provide insight into the functionality of the `.fit()` method, which is responsible for training machine learning models and fine-tuning model parameters. The underlying technique at play here is known as \"Gradient Descent.\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Let's Explore and Gain Intuition\n", + "\n", + "To further enhance your understanding and gain a playful insight into Gradient Descent, you can explore the following resources:\n", + "\n", + "- [Tensorflow Playground](https://playground.tensorflow.org/#activation=sigmoid&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.00001®ularizationRate=0&noise=0&networkShape=&seed=0.71864&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false): This interactive tool allows you to experiment with various machine learning concepts, including activation functions, datasets, and learning rate, providing a visual representation of how models learn.\n", + "\n", + "- [Gradient Descent Visualization](https://github.com/lilipads/gradient_descent_viz): This GitHub repository offers a visualization of the Gradient Descent algorithm, which can be a valuable resource for understanding the optimization process.\n", + "\n", + "- [Optimization Algorithms Visualization](https://bl.ocks.org/EmilienDupont/aaf429be5705b219aaaf8d691e27ca87): Explore this visualization to see how different optimization algorithms, including Gradient Descent, work and how they converge to find optimal solutions.\n", + "\n", + "These resources will help you build an intuitive grasp of Gradient Descent and its role in training machine learning models. Enjoy your exploration!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Abstract\n", + "\n", + "The fundamental concept behind gradient descent is rather straightforward: it involves the gradual adjustment of parameters, such as the slope ($m$) and the intercept ($b$) in our regression equation $y = mx + b, with the aim of minimizing a cost function. This cost function is typically a metric that quantifies the disparity between our model's predicted results and the actual values. In regression scenarios, the widely employed cost function is the `mean squared error` (MSE). When dealing with classification problems, a different set of parameters must be fine-tuned.\n", + "\n", + "The MSE (Mean Squared Error) is mathematically expressed as:\n", + "\n", + "$$\n", + "MSE = \\frac{1}{n}\\sum_{i=1}^{n} (y_i - \\hat{y_i})^2\n", + "$$\n", + "\n", + "Here, $y_i$ represents the actual data points, $\\hat{y_i}$ signifies the predictions made by our model ($mx_i + b$), and $n$ denotes the total number of data points.\n", + "\n", + "Our primary challenge is to determine the optimal adjustments to parameters $m$ and $b\" to minimize the MSE effectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Partial Derivatives\n", + "\n", + "In our pursuit of minimizing the Mean Squared Error (MSE), we turn to partial derivatives to understand how each individual parameter influences the MSE. The term \"partial\" signifies that we are taking derivatives with respect to individual parameters, in this case, $m$ and $b, separately.\n", + "\n", + "Consider the following formula, which closely resembles the MSE, but now we've introduced the function $f(m, b)$ into it. The addition of this function doesn't significantly alter the essence of the calculation, but it allows us to input specific values for $m$ and $b$ to compute the result.\n", + "\n", + "$$f(m, b) = \\frac{1}{n}\\sum_{i=1}^{n}(y_i - (mx_i+b))^2$$\n", + "\n", + "For the purposes of calculating partial derivatives, we can temporarily disregard the summation and the terms preceding it, focusing solely on the expression $y - (mx + b)^2$. This expression serves as a better starting point for the subsequent partial derivative calculations." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Partial Derivative with Respect to $m$\n", + "\n", + "When we calculate the partial derivative with respect to the parameter $m,\" we isolate the parameter $m\" and treat $b$ as a constant (effectively setting it to 0 for differentiation purposes). To compute this derivative, we utilize the chain rule, which is a fundamental concept in calculus.\n", + "\n", + "The chain rule is expressed as follows:\n", + "\n", + "$$ [f(g(x))]' = f'(g(x)) * g(x)' \\quad - \\textrm{chain rule} $$\n", + "\n", + "The chain rule is applicable when one function is nested inside another. In this context, the square operation, $()^2$, is the outer function, while $y - (mx + b)$ is the inner function. Following the chain rule, we differentiate the outer function, maintain the inner function as it is, and then multiply it by the derivative of the inner function. Let's break down the steps:\n", + "\n", + "$$ (y - (mx + b))^2 $$\n", + "\n", + "1. The derivative of $()^2$ is $2()$, just like $x^2$ becomes $2x$.\n", + "2. We leave the inner function, $y - (mx + b)$, unaltered.\n", + "3. The derivative of $y - (mx + b)$ with respect to **_m_** is $(0 - x)$, which simplifies to $-x$. This is because both **_y_** and **_b_** are treated as constants (their derivatives are zero), and the derivative of **_mx_** is simply **_x_**.\n", + "\n", + "Now, let's combine these components:\n", + "\n", + "$$ 2 \\cdot (y - (mx+b)) \\cdot (-x) $$\n", + "\n", + "For clarity, we can rearrange this expression by moving the factor of $-x$ to the left:\n", + "\n", + "$$ -2x \\cdot (y-(mx+b)) $$\n", + "\n", + "This is the final version of our derivative with respect to $m$:\n", + "\n", + "$$ \\frac{\\partial f}{\\partial m} = \\frac{1}{n}\\sum_{i=1}^{n} -2x_i(y_i - (mx_i+b)) $$\n", + "\n", + "Here, $\\frac{df}{dm}$ signifies the partial derivative of function $f$ (as previously defined) with respect to the parameter $m$. We can now insert this derivative into our summation to complete the calculation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Partial Derivative with Respect to $b$\n", + "\n", + "The process for computing the partial derivative with respect to the parameter $b\" is analogous to our previous derivation with respect to $m. We still apply the same rules and utilize the chain rule:\n", + "\n", + "1. The derivative of $()^2$ is $2()$, which corresponds to how $x^2$ becomes $2x$.\n", + "2. We leave the inner function, $y - (mx + b)$, unaltered.\n", + "3. For the derivative of $y - (mx + b)$ with respect to **_b_**, it becomes $(0 - 1)$ or simply $-1.\" This is because both **_y_** and **_mx_** are treated as constants (their derivatives are zero), and the derivative of **_b_** is 1.\n", + "\n", + "Now, let's consolidate these components:\n", + "\n", + "$$ 2 \\cdot (y - (mx+b)) \\cdot (-1) $$\n", + "\n", + "Simplifying this expression:\n", + "\n", + "$$ -2 \\cdot (y-(mx+b)) $$\n", + "\n", + "This is the final version of our derivative with respect to $b$:\n", + "\n", + "$$ \\frac{\\partial f}{\\partial b} = \\frac{1}{n}\\sum_{i=1}^{n} -2(y_i - (mx_i+b)) $$\n", + "\n", + "Similarly to the previous case, $\\frac{df}{db}$ represents the partial derivative of function $f$ with respect to the parameter $b\". Inserting this derivative into our summation concludes the computation." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Final Function\n", + "\n", + "Before delving into the code, there are a few essential details to address:\n", + "\n", + "1. Gradient descent is an iterative process, and with each iteration (referred to as an \"epoch\"), we incrementally reduce the Mean Squared Error (MSE). At each iteration, we apply our derived functions to update the values of parameters $m$ and $b$.\n", + "\n", + "2. Because gradient descent is iterative, we must determine how many iterations to perform, or devise a mechanism to stop the algorithm when it approaches the minimum of the MSE. In essence, we continue iterations until the algorithm no longer improves the MSE, signifying that it has reached a minimum.\n", + "\n", + "3. An important parameter in gradient descent is the learning rate ($lr$). The learning rate governs the pace at which the algorithm moves toward the minimum of the MSE. A smaller learning rate results in slower but more precise convergence, while a larger learning rate may lead to faster convergence but may overshoot the minimum.\n", + "\n", + "In summary, gradient descent primarily involves the process of taking derivatives and applying them iteratively to minimize a function. These derivatives guide us toward optimizing the parameters $m$ and $b\" in order to minimize the Mean Squared Error." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Time to code!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "import sklearn\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Linear Regression With Gradient Descent\n", + "class LinearRegression:\n", + " def __init__(self, learning_rate=0.0003, n_iters=3000):\n", + " self.lr = learning_rate\n", + " self.n_iters = n_iters\n", + " self.weights = None\n", + " self.bias = None\n", + "\n", + " def fit(self, X, y):\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Initialize parameters\n", + " self.weights = np.zeros(n_features)\n", + " self.bias = 0\n", + "\n", + " # Gradient Descent\n", + " for _ in range(self.n_iters):\n", + " # Approximate y with a linear combination of weights and x, plus bias\n", + " y_predicted = np.dot(X, self.weights) + self.bias\n", + "\n", + " # Compute gradients\n", + " dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))\n", + " db = (1 / n_samples) * np.sum(y_predicted - y)\n", + " \n", + " # Update parameters\n", + " self.weights -= self.lr * dw\n", + " self.bias -= self.lr * db\n", + "\n", + " def predict(self, X):\n", + " y_predicted = np.dot(X, self.weights) + self.bias\n", + " return y_predicted\n", + "\n", + "# Load data and perform linear regression\n", + "prostate = pd.read_table(\"../../assets/data/prostate.data\")\n", + "prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", + "\n", + "X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", + "y = prostate[\"lpsa\"]\n", + "\n", + "regressor = LinearRegression()\n", + "\n", + "regressor.fit(X, y)\n", + "y_pred = regressor.predict(X)\n", + "\n", + "print(regressor.__dict__)\n", + "print(y - y_pred)\n", + "\n", + "plt.scatter(y, y_pred)\n", + "plt.plot([0, 5], [0, 5])\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Linear Regression With Stochastic Gradient Descent\n", + "class LinearRegressionWithSGD:\n", + " def __init__(self, learning_rate=0.0003, n_iters=5000):\n", + " self.lr = learning_rate\n", + " self.n_iters = n_iters\n", + " self.weights = None\n", + " self.bias = None\n", + "\n", + " def fit(self, X, y):\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Initialize parameters\n", + " self.weights = np.zeros(n_features)\n", + " self.bias = 0\n", + "\n", + " batch_size = 5\n", + " # Stochastic Gradient Descent\n", + " for _ in range(self.n_iters):\n", + " # Approximate y with a linear combination of weights and x, plus bias\n", + " y_predicted = np.dot(X, self.weights) + self.bias\n", + " \n", + " indexes = np.random.randint(0, len(X), batch_size) # Random sample\n", + " \n", + " Xs = np.take(X, indexes, axis=0)\n", + " ys = np.take(y, indexes, axis=0)\n", + " y_predicted_s = np.take(y_predicted, indexes)\n", + " \n", + " # Compute gradients\n", + " dw = (1 / batch_size) * np.dot(Xs.T, (y_predicted_s - ys))\n", + " db = (1 / batch_size) * np.sum(y_predicted_s - ys)\n", + " \n", + " # Update parameters\n", + " self.weights -= self.lr * dw\n", + " self.bias -= self.lr * db\n", + "\n", + " def predict(self, X):\n", + " y_predicted = np.dot(X, self.weights) + self.bias\n", + " return y_predicted\n", + "\n", + "# Load data and perform linear regression with Stochastic Gradient Descent\n", + "prostate = pd.read_table(\"../../assets/data/prostate.data\")\n", + "prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", + "\n", + "X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", + "y = prostate[\"lpsa\"]\n", + "\n", + "regressor = LinearRegressionWithSGD()\n", + "\n", + "regressor.fit(X, y)\n", + "y_pred = regressor.predict(X)\n", + "\n", + "print(regressor.__dict__)\n", + "print(y - y_pred)\n", + "\n", + "plt.scatter(y, y_pred)\n", + "plt.plot([0, 5], [0, 5])\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Logistic Regression with Gradient Descent\n", + "class LogisticRegression:\n", + " def __init__(self, learning_rate=0.001, n_iters=1000):\n", + " self.lr = learning_rate\n", + " self.n_iters = n_iters\n", + " self.weights = None\n", + " self.bias = None\n", + "\n", + " def fit(self, X, y):\n", + " n_samples, n_features = X.shape\n", + "\n", + " # Initialize parameters\n", + " self.weights = np.zeros(n_features)\n", + " self.bias = 0\n", + "\n", + " # Gradient Descent\n", + " for _ in range(self.n_iters):\n", + " # Approximate y with a linear combination of weights and x, plus bias\n", + " linear_model = np.dot(X, self.weights) + self.bias\n", + " # Apply the sigmoid function\n", + " y_predicted = self._sigmoid(linear_model)\n", + "\n", + " # Compute gradients\n", + " dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))\n", + " db = (1 / n_samples) * np.sum(y_predicted - y)\n", + " \n", + " # Update parameters\n", + " self.weights -= self.lr * dw\n", + " self.bias -= self.lr * db\n", + "\n", + " def predict(self, X):\n", + " linear_model = np.dot(X, self.weights) + self.bias\n", + " y_predicted = self._sigmoid(linear_model)\n", + " y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]\n", + " return np.array(y_predicted_cls)\n", + "\n", + " def _sigmoid(self, x):\n", + " return 1 / (1 + np.exp(-x))\n", + "\n", + "# Load data and perform logistic regression\n", + "heart = pd.read_csv(\"../../assets/data/SA_heart.csv\")\n", + "heart.famhist.replace(to_replace=['Present', 'Absent'], value=[1, 0], inplace=True)\n", + "heart.drop(['row.names'], axis=1, inplace=True)\n", + "X = heart.iloc[:, :-1]\n", + "y = heart.iloc[:, -1]\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n", + "\n", + "regressor = LogisticRegression(learning_rate=0.0001, n_iters=1000)\n", + "\n", + "regressor.fit(X_train, y_train)\n", + "y_pred = regressor.predict(X_test)\n", + "perf = sklearn.metrics.confusion_matrix(y_test, y_pred)\n", + "print(\"LR classification perf:\\n\", perf)\n", + "\n", + "error_rate = np.mean(y_test != y_pred)\n", + "print(\"LR classification error rate:\\n\", error_rate)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Your turn 🚀\n", + "\n", + "Modify ```LogisticRegression``` so that the training will use SGD instead of GD." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 32d15f2176b95573921b79db88c3e8236f362379 Mon Sep 17 00:00:00 2001 From: 296406598 <296406598@qq.com> Date: Fri, 13 Oct 2023 05:52:23 +0800 Subject: [PATCH 2/7] update gradient-descent.ipynb --- .../ml-fundamentals/gradient-descent.ipynb | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb index 3025cbad27..ebc9f2ab95 100644 --- a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -173,13 +173,19 @@ "from sklearn.model_selection import train_test_split" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Linear Regression With Gradient Descent" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# Linear Regression With Gradient Descent\n", "class LinearRegression:\n", " def __init__(self, learning_rate=0.0003, n_iters=3000):\n", " self.lr = learning_rate\n", @@ -296,13 +302,19 @@ "plt.show()\n" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Linear Regression With Stochastic Gradient Descent" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# Logistic Regression with Gradient Descent\n", "class LogisticRegression:\n", " def __init__(self, learning_rate=0.001, n_iters=1000):\n", " self.lr = learning_rate\n", From f88a8ee05cf0b57b24609daa25ffc6b156425410 Mon Sep 17 00:00:00 2001 From: 296406598 <296406598@qq.com> Date: Fri, 13 Oct 2023 05:54:40 +0800 Subject: [PATCH 3/7] update gradient-descent.ipynb --- .../assignments/ml-fundamentals/gradient-descent.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb index ebc9f2ab95..ba54598c54 100644 --- a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -218,7 +218,7 @@ " return y_predicted\n", "\n", "# Load data and perform linear regression\n", - "prostate = pd.read_table(\"../../assets/data/prostate.data\")\n", + "prostate = pd.read_table(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/prostate.data\")\n", "prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", "\n", "X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", @@ -283,7 +283,7 @@ " return y_predicted\n", "\n", "# Load data and perform linear regression with Stochastic Gradient Descent\n", - "prostate = pd.read_table(\"../../assets/data/prostate.data\")\n", + "prostate = pd.read_table(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/prostate.data\")\n", "prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", "\n", "X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", @@ -354,7 +354,7 @@ " return 1 / (1 + np.exp(-x))\n", "\n", "# Load data and perform logistic regression\n", - "heart = pd.read_csv(\"../../assets/data/SA_heart.csv\")\n", + "heart = pd.read_csv(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/SA_heart.csv\")\n", "heart.famhist.replace(to_replace=['Present', 'Absent'], value=[1, 0], inplace=True)\n", "heart.drop(['row.names'], axis=1, inplace=True)\n", "X = heart.iloc[:, :-1]\n", From 4d189bc94764c9cfc688dab9ef077325414591f2 Mon Sep 17 00:00:00 2001 From: Lunde Chen Date: Sat, 14 Oct 2023 15:20:00 +0800 Subject: [PATCH 4/7] update gradient descent --- .../ml-fundamentals/gradient-descent.ipynb | 146 ------------------ 1 file changed, 146 deletions(-) diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb index ba54598c54..9ccf2a108a 100644 --- a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -29,7 +29,6 @@ "\n", "To further enhance your understanding and gain a playful insight into Gradient Descent, you can explore the following resources:\n", "\n", - "- [Tensorflow Playground](https://playground.tensorflow.org/#activation=sigmoid&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.00001®ularizationRate=0&noise=0&networkShape=&seed=0.71864&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false): This interactive tool allows you to experiment with various machine learning concepts, including activation functions, datasets, and learning rate, providing a visual representation of how models learn.\n", "\n", "- [Gradient Descent Visualization](https://github.com/lilipads/gradient_descent_viz): This GitHub repository offers a visualization of the Gradient Descent algorithm, which can be a valuable resource for understanding the optimization process.\n", "\n", @@ -236,151 +235,6 @@ "plt.plot([0, 5], [0, 5])\n", "plt.show()\n" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Linear Regression With Stochastic Gradient Descent\n", - "class LinearRegressionWithSGD:\n", - " def __init__(self, learning_rate=0.0003, n_iters=5000):\n", - " self.lr = learning_rate\n", - " self.n_iters = n_iters\n", - " self.weights = None\n", - " self.bias = None\n", - "\n", - " def fit(self, X, y):\n", - " n_samples, n_features = X.shape\n", - "\n", - " # Initialize parameters\n", - " self.weights = np.zeros(n_features)\n", - " self.bias = 0\n", - "\n", - " batch_size = 5\n", - " # Stochastic Gradient Descent\n", - " for _ in range(self.n_iters):\n", - " # Approximate y with a linear combination of weights and x, plus bias\n", - " y_predicted = np.dot(X, self.weights) + self.bias\n", - " \n", - " indexes = np.random.randint(0, len(X), batch_size) # Random sample\n", - " \n", - " Xs = np.take(X, indexes, axis=0)\n", - " ys = np.take(y, indexes, axis=0)\n", - " y_predicted_s = np.take(y_predicted, indexes)\n", - " \n", - " # Compute gradients\n", - " dw = (1 / batch_size) * np.dot(Xs.T, (y_predicted_s - ys))\n", - " db = (1 / batch_size) * np.sum(y_predicted_s - ys)\n", - " \n", - " # Update parameters\n", - " self.weights -= self.lr * dw\n", - " self.bias -= self.lr * db\n", - "\n", - " def predict(self, X):\n", - " y_predicted = np.dot(X, self.weights) + self.bias\n", - " return y_predicted\n", - "\n", - "# Load data and perform linear regression with Stochastic Gradient Descent\n", - "prostate = pd.read_table(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/prostate.data\")\n", - "prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", - "\n", - "X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", - "y = prostate[\"lpsa\"]\n", - "\n", - "regressor = LinearRegressionWithSGD()\n", - "\n", - "regressor.fit(X, y)\n", - "y_pred = regressor.predict(X)\n", - "\n", - "print(regressor.__dict__)\n", - "print(y - y_pred)\n", - "\n", - "plt.scatter(y, y_pred)\n", - "plt.plot([0, 5], [0, 5])\n", - "plt.show()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Linear Regression With Stochastic Gradient Descent" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "class LogisticRegression:\n", - " def __init__(self, learning_rate=0.001, n_iters=1000):\n", - " self.lr = learning_rate\n", - " self.n_iters = n_iters\n", - " self.weights = None\n", - " self.bias = None\n", - "\n", - " def fit(self, X, y):\n", - " n_samples, n_features = X.shape\n", - "\n", - " # Initialize parameters\n", - " self.weights = np.zeros(n_features)\n", - " self.bias = 0\n", - "\n", - " # Gradient Descent\n", - " for _ in range(self.n_iters):\n", - " # Approximate y with a linear combination of weights and x, plus bias\n", - " linear_model = np.dot(X, self.weights) + self.bias\n", - " # Apply the sigmoid function\n", - " y_predicted = self._sigmoid(linear_model)\n", - "\n", - " # Compute gradients\n", - " dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))\n", - " db = (1 / n_samples) * np.sum(y_predicted - y)\n", - " \n", - " # Update parameters\n", - " self.weights -= self.lr * dw\n", - " self.bias -= self.lr * db\n", - "\n", - " def predict(self, X):\n", - " linear_model = np.dot(X, self.weights) + self.bias\n", - " y_predicted = self._sigmoid(linear_model)\n", - " y_predicted_cls = [1 if i > 0.5 else 0 for i in y_predicted]\n", - " return np.array(y_predicted_cls)\n", - "\n", - " def _sigmoid(self, x):\n", - " return 1 / (1 + np.exp(-x))\n", - "\n", - "# Load data and perform logistic regression\n", - "heart = pd.read_csv(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/SA_heart.csv\")\n", - "heart.famhist.replace(to_replace=['Present', 'Absent'], value=[1, 0], inplace=True)\n", - "heart.drop(['row.names'], axis=1, inplace=True)\n", - "X = heart.iloc[:, :-1]\n", - "y = heart.iloc[:, -1]\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n", - "\n", - "regressor = LogisticRegression(learning_rate=0.0001, n_iters=1000)\n", - "\n", - "regressor.fit(X_train, y_train)\n", - "y_pred = regressor.predict(X_test)\n", - "perf = sklearn.metrics.confusion_matrix(y_test, y_pred)\n", - "print(\"LR classification perf:\\n\", perf)\n", - "\n", - "error_rate = np.mean(y_test != y_pred)\n", - "print(\"LR classification error rate:\\n\", error_rate)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Your turn 🚀\n", - "\n", - "Modify ```LogisticRegression``` so that the training will use SGD instead of GD." - ] } ], "metadata": { From 9fb2037425395be5663277a43a75de7c01b2714c Mon Sep 17 00:00:00 2001 From: Lunde Chen Date: Sat, 14 Oct 2023 15:26:05 +0800 Subject: [PATCH 5/7] update gradient descent --- .../assignments/ml-fundamentals/gradient-descent.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb index 9ccf2a108a..f7a7816596 100644 --- a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -60,7 +60,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Partial Derivatives\n", + "### Partial Derivatives (feel free to skip if you find it difficult)\n", "\n", "In our pursuit of minimizing the Mean Squared Error (MSE), we turn to partial derivatives to understand how each individual parameter influences the MSE. The term \"partial\" signifies that we are taking derivatives with respect to individual parameters, in this case, $m$ and $b, separately.\n", "\n", From 2119ff271835d5cc8564dba7175f702c9b13071e Mon Sep 17 00:00:00 2001 From: Lunde Chen Date: Sat, 14 Oct 2023 15:26:43 +0800 Subject: [PATCH 6/7] update gradient descent --- .../assignments/ml-fundamentals/gradient-descent.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb index f7a7816596..35553a25da 100644 --- a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb +++ b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb @@ -41,7 +41,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Abstract\n", + "### Math (feel free to skip if you find it difficult)\n", "\n", "The fundamental concept behind gradient descent is rather straightforward: it involves the gradual adjustment of parameters, such as the slope ($m$) and the intercept ($b$) in our regression equation $y = mx + b, with the aim of minimizing a cost function. This cost function is typically a metric that quantifies the disparity between our model's predicted results and the actual values. In regression scenarios, the widely employed cost function is the `mean squared error` (MSE). When dealing with classification problems, a different set of parameters must be fine-tuned.\n", "\n", @@ -60,7 +60,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Partial Derivatives (feel free to skip if you find it difficult)\n", + "### Partial Derivatives \n", "\n", "In our pursuit of minimizing the Mean Squared Error (MSE), we turn to partial derivatives to understand how each individual parameter influences the MSE. The term \"partial\" signifies that we are taking derivatives with respect to individual parameters, in this case, $m$ and $b, separately.\n", "\n", From f6225603e3bebe181e39ec936e30bbe0ed6318ba Mon Sep 17 00:00:00 2001 From: Lunde Chen Date: Tue, 17 Oct 2023 19:13:20 +0800 Subject: [PATCH 7/7] update _toc.yml --- open-machine-learning-jupyter-book/_toc.yml | 1 + .../{ => linear-regression}/gradient-descent.ipynb | 0 2 files changed, 1 insertion(+) rename open-machine-learning-jupyter-book/assignments/ml-fundamentals/{ => linear-regression}/gradient-descent.ipynb (100%) diff --git a/open-machine-learning-jupyter-book/_toc.yml b/open-machine-learning-jupyter-book/_toc.yml index 535d5d0e37..6ac77e0e31 100644 --- a/open-machine-learning-jupyter-book/_toc.yml +++ b/open-machine-learning-jupyter-book/_toc.yml @@ -151,6 +151,7 @@ parts: - file: assignments/ml-fundamentals/ml-linear-regression-1 - file: assignments/ml-fundamentals/ml-linear-regression-2 - file: assignments/ml-fundamentals/linear-regression/linear-regression-metrics.ipynb + - file: assignments/ml-fundamentals/linear-regression/gradient-descent.ipynb - file: assignments/ml-fundamentals/ml-logistic-regression-1 - file: assignments/ml-fundamentals/ml-logistic-regression-2 - file: assignments/ml-fundamentals/ml-neural-network-1 diff --git a/open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb b/open-machine-learning-jupyter-book/assignments/ml-fundamentals/linear-regression/gradient-descent.ipynb similarity index 100% rename from open-machine-learning-jupyter-book/assignments/ml-fundamentals/gradient-descent.ipynb rename to open-machine-learning-jupyter-book/assignments/ml-fundamentals/linear-regression/gradient-descent.ipynb