diff --git a/.gitignore b/.gitignore index 7eade253..ec81b3ee 100644 --- a/.gitignore +++ b/.gitignore @@ -2,4 +2,4 @@ *.zip data/ .ipynb_checkpoints - +.idea diff --git a/model.ckpt b/model.ckpt new file mode 100644 index 00000000..07cf9ee2 Binary files /dev/null and b/model.ckpt differ diff --git a/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.ipynb b/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.ipynb new file mode 100644 index 00000000..dbbbd6bb --- /dev/null +++ b/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.ipynb @@ -0,0 +1,178 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Including this Jupyter Notebook for the python script (main.py) as it will help to directly place the link in Google Colab and run the scripts from there.\n", + "\n", + "#### Compute basic gradients from the sample tensors using PyTorch\n", + "\n", + "##### First some basics of Pytorch terminology\n", + "\n", + "**Autograd**: This class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.\n", + "\n", + "A Variable class wraps a tensor. You can access this tensor by calling `.data` attribute of a Variable.\n", + "\n", + "The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the `.grad` attribute. This is basically the gradient computed up to this particular node, and the gradient of the every subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it.\n", + "\n", + "The third attribute a Variable holds is a grad_fn, a Function object which created the variable.\n", + "\n", + "**Variable**: The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network. Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example.\n", + "\n", + "Every **variable** object has several members one of them is **grad**:\n", + "\n", + "**grad**: grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.\n", + "\n", + "**Backward() function**\n", + "Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "pycharm": { + "name": "#%%\n" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\tgrad: 11.0 21.0 tensor(-220.)\n", + "progress: 0 tensor(100.)\n", + "\tgrad: 22.0 14.0 tensor(2481.6001)\n", + "progress: 0 tensor(3180.9602)\n", + "\tgrad: 33.0 64.0 tensor(-51303.6484)\n", + "progress: 0 tensor(604238.8125)\n", + "\tgrad: 11.0 21.0 tensor(118461.7578)\n", + "progress: 1 tensor(28994192.)\n", + "\tgrad: 22.0 14.0 tensor(-671630.6875)\n", + "progress: 1 tensor(2.3300e+08)\n", + "\tgrad: 33.0 64.0 tensor(13114108.)\n", + "progress: 1 tensor(3.9481e+10)\n", + "\tgrad: 11.0 21.0 tensor(-30279010.)\n", + "progress: 2 tensor(1.8943e+12)\n", + "\tgrad: 22.0 14.0 tensor(1.7199e+08)\n", + "progress: 2 tensor(1.5279e+13)\n", + "\tgrad: 33.0 64.0 tensor(-3.3589e+09)\n", + "progress: 2 tensor(2.5900e+15)\n", + "\tgrad: 11.0 21.0 tensor(7.7553e+09)\n", + "progress: 3 tensor(1.2427e+17)\n", + "\tgrad: 22.0 14.0 tensor(-4.4050e+10)\n", + "progress: 3 tensor(1.0023e+18)\n", + "\tgrad: 33.0 64.0 tensor(8.6030e+11)\n", + "progress: 3 tensor(1.6991e+20)\n", + "\tgrad: 11.0 21.0 tensor(-1.9863e+12)\n", + "progress: 4 tensor(8.1519e+21)\n", + "\tgrad: 22.0 14.0 tensor(1.1282e+13)\n", + "progress: 4 tensor(6.5750e+22)\n", + "\tgrad: 33.0 64.0 tensor(-2.2034e+14)\n", + "progress: 4 tensor(1.1146e+25)\n", + "\tgrad: 11.0 21.0 tensor(5.0875e+14)\n", + "progress: 5 tensor(5.3477e+26)\n", + "\tgrad: 22.0 14.0 tensor(-2.8897e+15)\n", + "progress: 5 tensor(4.3132e+27)\n", + "\tgrad: 33.0 64.0 tensor(5.6436e+16)\n", + "progress: 5 tensor(7.3118e+29)\n", + "\tgrad: 11.0 21.0 tensor(-1.3030e+17)\n", + "progress: 6 tensor(3.5081e+31)\n", + "\tgrad: 22.0 14.0 tensor(7.4013e+17)\n", + "progress: 6 tensor(2.8295e+32)\n", + "\tgrad: 33.0 64.0 tensor(-1.4455e+19)\n", + "progress: 6 tensor(4.7966e+34)\n", + "\tgrad: 11.0 21.0 tensor(3.3374e+19)\n", + "progress: 7 tensor(2.3013e+36)\n", + "\tgrad: 22.0 14.0 tensor(-1.8957e+20)\n", + "progress: 7 tensor(1.8562e+37)\n", + "\tgrad: 33.0 64.0 tensor(3.7022e+21)\n", + "progress: 7 tensor(inf)\n", + "\tgrad: 11.0 21.0 tensor(-8.5480e+21)\n", + "progress: 8 tensor(inf)\n", + "\tgrad: 22.0 14.0 tensor(4.8553e+22)\n", + "progress: 8 tensor(inf)\n", + "\tgrad: 33.0 64.0 tensor(-9.4824e+23)\n", + "progress: 8 tensor(inf)\n", + "\tgrad: 11.0 21.0 tensor(2.1894e+24)\n", + "progress: 9 tensor(inf)\n", + "\tgrad: 22.0 14.0 tensor(-1.2436e+25)\n", + "progress: 9 tensor(inf)\n", + "\tgrad: 33.0 64.0 tensor(2.4287e+26)\n", + "progress: 9 tensor(inf)\n" + ] + } + ], + "source": [ + "import torch\n", + "from torch.autograd import Variable\n", + "\n", + "def forward(x):\n", + " return x * w\n", + "\n", + "w = Variable(torch.Tensor([1.0]), requires_grad=True)\n", + "# . On setting .requires_grad = True they start forming a backward graph\n", + "# that tracks every operation applied on them to calculate the gradients\n", + "# using something called a dynamic computation graph (DCG)\n", + "# When you finish your computation you can call .backward() and have\n", + "# all the gradients computed automatically. The gradient for this tensor\n", + "# will be accumulated into .grad attribute.\n", + "\n", + "# Now create an array of data.\n", + "# By PyTorch’s design, gradients can only be calculated\n", + "# for floating point tensors which is why I’ve created a float type\n", + "# array before making it a gradient enabled PyTorch tensor\n", + "x_data = [11.0, 22.0, 33.0]\n", + "y_data = [21.0, 14.0, 64.0]\n", + "\n", + "def loss_function(x, y):\n", + " y_pred = forward(x)\n", + " return (y_pred - y) * (y_pred - y)\n", + "\n", + "\n", + "# Now running the training loop\n", + "for epoch in range(10):\n", + " for x_val, y_val in zip(x_data, y_data):\n", + " l = loss_function(x_val, y_val)\n", + " l.backward()\n", + " print(\"\\tgrad: \", x_val, y_val, w.grad.data[0])\n", + " w.data = w.data - 0.01 * w.grad\n", + "\n", + " # Manually set the gradient to zero after updating weights\n", + " w.grad.data.zero_()\n", + "\n", + " print('progress: ', epoch, l.data[0])\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.py b/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.py new file mode 100644 index 00000000..3291bb11 --- /dev/null +++ b/tutorials/02-intermediate/basic_gradient_computation_from_tensors/main.py @@ -0,0 +1,61 @@ +import torch +from torch.autograd import Variable + +''' Compute basic gradients from the sample tensors using PyTorch + +First some basics of Pytorch terminology + +Autograd: This class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule. + +A Variable class wraps a tensor. You can access this tensor by calling .data attribute of a Variable. + +The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the .grad attribute. This is basically the gradient computed up to this particular node, and the gradient of the every subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it. + +The third attribute a Variable holds is a grad_fn, a Function object which created the variable. + +Variable: The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network. Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example. + +Every variable object has several members one of them is grad: + +grad: grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x. + +Backward() function Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes. ''' + + +def forward(x): + return x * w + + +w = Variable(torch.Tensor([1.0]), requires_grad=True) +# On setting .requires_grad = True they start forming a backward graph +# that tracks every operation applied on them to calculate the gradients +# using something called a dynamic computation graph (DCG) +# When you finish your computation you can call .backward() and have +# all the gradients computed automatically. The gradient for this tensor +# will be accumulated into .grad attribute. + +# Now create an array of data. +# By PyTorch’s design, gradients can only be calculated +# for floating point tensors which is why I’ve created a float type +# array before making it a gradient enabled PyTorch tensor +x_data = [11.0, 22.0, 33.0] +y_data = [21.0, 14.0, 64.0] + + +def loss_function(x, y): + y_pred = forward(x) + return (y_pred - y) * (y_pred - y) + + +# Now running the training loop +for epoch in range(10): + for x_val, y_val in zip(x_data, y_data): + l = loss_function(x_val, y_val) + l.backward() + print("\tgrad: ", x_val, y_val, w.grad.data[0]) + w.data = w.data - 0.01 * w.grad + + # Manually set the gradient to zero after updating weights + w.grad.data.zero_() + + print('progress: ', epoch, l.data[0])