-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #617 from WHQWHQWHQ/AddEvaluationMetrics
Add gradient-descent.ipynb assignment
- Loading branch information
Showing
2 changed files
with
262 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
261 changes: 261 additions & 0 deletions
261
...earning-jupyter-book/assignments/ml-fundamentals/linear-regression/gradient-descent.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,261 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Gradient descent" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Session Objective\n", | ||
"\n", | ||
"In previous sessions, we've delved into the application of Linear Regression and Logistic Regression models. You may find the code relatively straightforward and intuitive at this point. However, you might be pondering questions like:\n", | ||
"\n", | ||
"- What exactly occurs when we invoke the `.fit()` function?\n", | ||
"- Why does the execution of the `.fit()` function sometimes take a significant amount of time?\n", | ||
"\n", | ||
"This session is designed to provide insight into the functionality of the `.fit()` method, which is responsible for training machine learning models and fine-tuning model parameters. The underlying technique at play here is known as \"Gradient Descent.\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Let's Explore and Gain Intuition\n", | ||
"\n", | ||
"To further enhance your understanding and gain a playful insight into Gradient Descent, you can explore the following resources:\n", | ||
"\n", | ||
"\n", | ||
"- [Gradient Descent Visualization](https://github.com/lilipads/gradient_descent_viz): This GitHub repository offers a visualization of the Gradient Descent algorithm, which can be a valuable resource for understanding the optimization process.\n", | ||
"\n", | ||
"- [Optimization Algorithms Visualization](https://bl.ocks.org/EmilienDupont/aaf429be5705b219aaaf8d691e27ca87): Explore this visualization to see how different optimization algorithms, including Gradient Descent, work and how they converge to find optimal solutions.\n", | ||
"\n", | ||
"These resources will help you build an intuitive grasp of Gradient Descent and its role in training machine learning models. Enjoy your exploration!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Math (feel free to skip if you find it difficult)\n", | ||
"\n", | ||
"The fundamental concept behind gradient descent is rather straightforward: it involves the gradual adjustment of parameters, such as the slope ($m$) and the intercept ($b$) in our regression equation $y = mx + b, with the aim of minimizing a cost function. This cost function is typically a metric that quantifies the disparity between our model's predicted results and the actual values. In regression scenarios, the widely employed cost function is the `mean squared error` (MSE). When dealing with classification problems, a different set of parameters must be fine-tuned.\n", | ||
"\n", | ||
"The MSE (Mean Squared Error) is mathematically expressed as:\n", | ||
"\n", | ||
"$$\n", | ||
"MSE = \\frac{1}{n}\\sum_{i=1}^{n} (y_i - \\hat{y_i})^2\n", | ||
"$$\n", | ||
"\n", | ||
"Here, $y_i$ represents the actual data points, $\\hat{y_i}$ signifies the predictions made by our model ($mx_i + b$), and $n$ denotes the total number of data points.\n", | ||
"\n", | ||
"Our primary challenge is to determine the optimal adjustments to parameters $m$ and $b\" to minimize the MSE effectively." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Partial Derivatives \n", | ||
"\n", | ||
"In our pursuit of minimizing the Mean Squared Error (MSE), we turn to partial derivatives to understand how each individual parameter influences the MSE. The term \"partial\" signifies that we are taking derivatives with respect to individual parameters, in this case, $m$ and $b, separately.\n", | ||
"\n", | ||
"Consider the following formula, which closely resembles the MSE, but now we've introduced the function $f(m, b)$ into it. The addition of this function doesn't significantly alter the essence of the calculation, but it allows us to input specific values for $m$ and $b$ to compute the result.\n", | ||
"\n", | ||
"$$f(m, b) = \\frac{1}{n}\\sum_{i=1}^{n}(y_i - (mx_i+b))^2$$\n", | ||
"\n", | ||
"For the purposes of calculating partial derivatives, we can temporarily disregard the summation and the terms preceding it, focusing solely on the expression $y - (mx + b)^2$. This expression serves as a better starting point for the subsequent partial derivative calculations." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Partial Derivative with Respect to $m$\n", | ||
"\n", | ||
"When we calculate the partial derivative with respect to the parameter $m,\" we isolate the parameter $m\" and treat $b$ as a constant (effectively setting it to 0 for differentiation purposes). To compute this derivative, we utilize the chain rule, which is a fundamental concept in calculus.\n", | ||
"\n", | ||
"The chain rule is expressed as follows:\n", | ||
"\n", | ||
"$$ [f(g(x))]' = f'(g(x)) * g(x)' \\quad - \\textrm{chain rule} $$\n", | ||
"\n", | ||
"The chain rule is applicable when one function is nested inside another. In this context, the square operation, $()^2$, is the outer function, while $y - (mx + b)$ is the inner function. Following the chain rule, we differentiate the outer function, maintain the inner function as it is, and then multiply it by the derivative of the inner function. Let's break down the steps:\n", | ||
"\n", | ||
"$$ (y - (mx + b))^2 $$\n", | ||
"\n", | ||
"1. The derivative of $()^2$ is $2()$, just like $x^2$ becomes $2x$.\n", | ||
"2. We leave the inner function, $y - (mx + b)$, unaltered.\n", | ||
"3. The derivative of $y - (mx + b)$ with respect to **_m_** is $(0 - x)$, which simplifies to $-x$. This is because both **_y_** and **_b_** are treated as constants (their derivatives are zero), and the derivative of **_mx_** is simply **_x_**.\n", | ||
"\n", | ||
"Now, let's combine these components:\n", | ||
"\n", | ||
"$$ 2 \\cdot (y - (mx+b)) \\cdot (-x) $$\n", | ||
"\n", | ||
"For clarity, we can rearrange this expression by moving the factor of $-x$ to the left:\n", | ||
"\n", | ||
"$$ -2x \\cdot (y-(mx+b)) $$\n", | ||
"\n", | ||
"This is the final version of our derivative with respect to $m$:\n", | ||
"\n", | ||
"$$ \\frac{\\partial f}{\\partial m} = \\frac{1}{n}\\sum_{i=1}^{n} -2x_i(y_i - (mx_i+b)) $$\n", | ||
"\n", | ||
"Here, $\\frac{df}{dm}$ signifies the partial derivative of function $f$ (as previously defined) with respect to the parameter $m$. We can now insert this derivative into our summation to complete the calculation." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Partial Derivative with Respect to $b$\n", | ||
"\n", | ||
"The process for computing the partial derivative with respect to the parameter $b\" is analogous to our previous derivation with respect to $m. We still apply the same rules and utilize the chain rule:\n", | ||
"\n", | ||
"1. The derivative of $()^2$ is $2()$, which corresponds to how $x^2$ becomes $2x$.\n", | ||
"2. We leave the inner function, $y - (mx + b)$, unaltered.\n", | ||
"3. For the derivative of $y - (mx + b)$ with respect to **_b_**, it becomes $(0 - 1)$ or simply $-1.\" This is because both **_y_** and **_mx_** are treated as constants (their derivatives are zero), and the derivative of **_b_** is 1.\n", | ||
"\n", | ||
"Now, let's consolidate these components:\n", | ||
"\n", | ||
"$$ 2 \\cdot (y - (mx+b)) \\cdot (-1) $$\n", | ||
"\n", | ||
"Simplifying this expression:\n", | ||
"\n", | ||
"$$ -2 \\cdot (y-(mx+b)) $$\n", | ||
"\n", | ||
"This is the final version of our derivative with respect to $b$:\n", | ||
"\n", | ||
"$$ \\frac{\\partial f}{\\partial b} = \\frac{1}{n}\\sum_{i=1}^{n} -2(y_i - (mx_i+b)) $$\n", | ||
"\n", | ||
"Similarly to the previous case, $\\frac{df}{db}$ represents the partial derivative of function $f$ with respect to the parameter $b\". Inserting this derivative into our summation concludes the computation." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### The Final Function\n", | ||
"\n", | ||
"Before delving into the code, there are a few essential details to address:\n", | ||
"\n", | ||
"1. Gradient descent is an iterative process, and with each iteration (referred to as an \"epoch\"), we incrementally reduce the Mean Squared Error (MSE). At each iteration, we apply our derived functions to update the values of parameters $m$ and $b$.\n", | ||
"\n", | ||
"2. Because gradient descent is iterative, we must determine how many iterations to perform, or devise a mechanism to stop the algorithm when it approaches the minimum of the MSE. In essence, we continue iterations until the algorithm no longer improves the MSE, signifying that it has reached a minimum.\n", | ||
"\n", | ||
"3. An important parameter in gradient descent is the learning rate ($lr$). The learning rate governs the pace at which the algorithm moves toward the minimum of the MSE. A smaller learning rate results in slower but more precise convergence, while a larger learning rate may lead to faster convergence but may overshoot the minimum.\n", | ||
"\n", | ||
"In summary, gradient descent primarily involves the process of taking derivatives and applying them iteratively to minimize a function. These derivatives guide us toward optimizing the parameters $m$ and $b\" in order to minimize the Mean Squared Error." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Time to code!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%matplotlib inline\n", | ||
"\n", | ||
"import numpy as np\n", | ||
"import pandas as pd\n", | ||
"import sklearn\n", | ||
"import matplotlib.pyplot as plt\n", | ||
"from sklearn.model_selection import train_test_split" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Linear Regression With Gradient Descent" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"class LinearRegression:\n", | ||
" def __init__(self, learning_rate=0.0003, n_iters=3000):\n", | ||
" self.lr = learning_rate\n", | ||
" self.n_iters = n_iters\n", | ||
" self.weights = None\n", | ||
" self.bias = None\n", | ||
"\n", | ||
" def fit(self, X, y):\n", | ||
" n_samples, n_features = X.shape\n", | ||
"\n", | ||
" # Initialize parameters\n", | ||
" self.weights = np.zeros(n_features)\n", | ||
" self.bias = 0\n", | ||
"\n", | ||
" # Gradient Descent\n", | ||
" for _ in range(self.n_iters):\n", | ||
" # Approximate y with a linear combination of weights and x, plus bias\n", | ||
" y_predicted = np.dot(X, self.weights) + self.bias\n", | ||
"\n", | ||
" # Compute gradients\n", | ||
" dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))\n", | ||
" db = (1 / n_samples) * np.sum(y_predicted - y)\n", | ||
" \n", | ||
" # Update parameters\n", | ||
" self.weights -= self.lr * dw\n", | ||
" self.bias -= self.lr * db\n", | ||
"\n", | ||
" def predict(self, X):\n", | ||
" y_predicted = np.dot(X, self.weights) + self.bias\n", | ||
" return y_predicted\n", | ||
"\n", | ||
"# Load data and perform linear regression\n", | ||
"prostate = pd.read_table(\"https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/prostate.data\")\n", | ||
"prostate.drop(prostate.columns[0], axis=1, inplace=True)\n", | ||
"\n", | ||
"X = prostate.drop([\"lpsa\", \"train\"], axis=1)\n", | ||
"y = prostate[\"lpsa\"]\n", | ||
"\n", | ||
"regressor = LinearRegression()\n", | ||
"\n", | ||
"regressor.fit(X, y)\n", | ||
"y_pred = regressor.predict(X)\n", | ||
"\n", | ||
"print(regressor.__dict__)\n", | ||
"print(y - y_pred)\n", | ||
"\n", | ||
"plt.scatter(y, y_pred)\n", | ||
"plt.plot([0, 5], [0, 5])\n", | ||
"plt.show()\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.16" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
b84dca6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Deployed on https://652e7493c2e4480482f127b9--frabjous-fairy-a34ccc.netlify.app