Skip to content

Adding tutorial file for basic gradient calculation #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
*.zip
data/
.ipynb_checkpoints

.idea
Binary file added model.ckpt
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Including this Jupyter Notebook for the python script (main.py) as it will help to directly place the link in Google Colab and run the scripts from there.\n",
"\n",
"#### Compute basic gradients from the sample tensors using PyTorch\n",
"\n",
"##### First some basics of Pytorch terminology\n",
"\n",
"**Autograd**: This class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.\n",
"\n",
"A Variable class wraps a tensor. You can access this tensor by calling `.data` attribute of a Variable.\n",
"\n",
"The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the `.grad` attribute. This is basically the gradient computed up to this particular node, and the gradient of the every subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it.\n",
"\n",
"The third attribute a Variable holds is a grad_fn, a Function object which created the variable.\n",
"\n",
"**Variable**: The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network. Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example.\n",
"\n",
"Every **variable** object has several members one of them is **grad**:\n",
"\n",
"**grad**: grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.\n",
"\n",
"**Backward() function**\n",
"Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\tgrad: 11.0 21.0 tensor(-220.)\n",
"progress: 0 tensor(100.)\n",
"\tgrad: 22.0 14.0 tensor(2481.6001)\n",
"progress: 0 tensor(3180.9602)\n",
"\tgrad: 33.0 64.0 tensor(-51303.6484)\n",
"progress: 0 tensor(604238.8125)\n",
"\tgrad: 11.0 21.0 tensor(118461.7578)\n",
"progress: 1 tensor(28994192.)\n",
"\tgrad: 22.0 14.0 tensor(-671630.6875)\n",
"progress: 1 tensor(2.3300e+08)\n",
"\tgrad: 33.0 64.0 tensor(13114108.)\n",
"progress: 1 tensor(3.9481e+10)\n",
"\tgrad: 11.0 21.0 tensor(-30279010.)\n",
"progress: 2 tensor(1.8943e+12)\n",
"\tgrad: 22.0 14.0 tensor(1.7199e+08)\n",
"progress: 2 tensor(1.5279e+13)\n",
"\tgrad: 33.0 64.0 tensor(-3.3589e+09)\n",
"progress: 2 tensor(2.5900e+15)\n",
"\tgrad: 11.0 21.0 tensor(7.7553e+09)\n",
"progress: 3 tensor(1.2427e+17)\n",
"\tgrad: 22.0 14.0 tensor(-4.4050e+10)\n",
"progress: 3 tensor(1.0023e+18)\n",
"\tgrad: 33.0 64.0 tensor(8.6030e+11)\n",
"progress: 3 tensor(1.6991e+20)\n",
"\tgrad: 11.0 21.0 tensor(-1.9863e+12)\n",
"progress: 4 tensor(8.1519e+21)\n",
"\tgrad: 22.0 14.0 tensor(1.1282e+13)\n",
"progress: 4 tensor(6.5750e+22)\n",
"\tgrad: 33.0 64.0 tensor(-2.2034e+14)\n",
"progress: 4 tensor(1.1146e+25)\n",
"\tgrad: 11.0 21.0 tensor(5.0875e+14)\n",
"progress: 5 tensor(5.3477e+26)\n",
"\tgrad: 22.0 14.0 tensor(-2.8897e+15)\n",
"progress: 5 tensor(4.3132e+27)\n",
"\tgrad: 33.0 64.0 tensor(5.6436e+16)\n",
"progress: 5 tensor(7.3118e+29)\n",
"\tgrad: 11.0 21.0 tensor(-1.3030e+17)\n",
"progress: 6 tensor(3.5081e+31)\n",
"\tgrad: 22.0 14.0 tensor(7.4013e+17)\n",
"progress: 6 tensor(2.8295e+32)\n",
"\tgrad: 33.0 64.0 tensor(-1.4455e+19)\n",
"progress: 6 tensor(4.7966e+34)\n",
"\tgrad: 11.0 21.0 tensor(3.3374e+19)\n",
"progress: 7 tensor(2.3013e+36)\n",
"\tgrad: 22.0 14.0 tensor(-1.8957e+20)\n",
"progress: 7 tensor(1.8562e+37)\n",
"\tgrad: 33.0 64.0 tensor(3.7022e+21)\n",
"progress: 7 tensor(inf)\n",
"\tgrad: 11.0 21.0 tensor(-8.5480e+21)\n",
"progress: 8 tensor(inf)\n",
"\tgrad: 22.0 14.0 tensor(4.8553e+22)\n",
"progress: 8 tensor(inf)\n",
"\tgrad: 33.0 64.0 tensor(-9.4824e+23)\n",
"progress: 8 tensor(inf)\n",
"\tgrad: 11.0 21.0 tensor(2.1894e+24)\n",
"progress: 9 tensor(inf)\n",
"\tgrad: 22.0 14.0 tensor(-1.2436e+25)\n",
"progress: 9 tensor(inf)\n",
"\tgrad: 33.0 64.0 tensor(2.4287e+26)\n",
"progress: 9 tensor(inf)\n"
]
}
],
"source": [
"import torch\n",
"from torch.autograd import Variable\n",
"\n",
"def forward(x):\n",
" return x * w\n",
"\n",
"w = Variable(torch.Tensor([1.0]), requires_grad=True)\n",
"# . On setting .requires_grad = True they start forming a backward graph\n",
"# that tracks every operation applied on them to calculate the gradients\n",
"# using something called a dynamic computation graph (DCG)\n",
"# When you finish your computation you can call .backward() and have\n",
"# all the gradients computed automatically. The gradient for this tensor\n",
"# will be accumulated into .grad attribute.\n",
"\n",
"# Now create an array of data.\n",
"# By PyTorch’s design, gradients can only be calculated\n",
"# for floating point tensors which is why I’ve created a float type\n",
"# array before making it a gradient enabled PyTorch tensor\n",
"x_data = [11.0, 22.0, 33.0]\n",
"y_data = [21.0, 14.0, 64.0]\n",
"\n",
"def loss_function(x, y):\n",
" y_pred = forward(x)\n",
" return (y_pred - y) * (y_pred - y)\n",
"\n",
"\n",
"# Now running the training loop\n",
"for epoch in range(10):\n",
" for x_val, y_val in zip(x_data, y_data):\n",
" l = loss_function(x_val, y_val)\n",
" l.backward()\n",
" print(\"\\tgrad: \", x_val, y_val, w.grad.data[0])\n",
" w.data = w.data - 0.01 * w.grad\n",
"\n",
" # Manually set the gradient to zero after updating weights\n",
" w.grad.data.zero_()\n",
"\n",
" print('progress: ', epoch, l.data[0])\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import torch
from torch.autograd import Variable

''' Compute basic gradients from the sample tensors using PyTorch

First some basics of Pytorch terminology

Autograd: This class is an engine to calculate derivatives (Jacobian-vector product to be more precise). It records a graph of all the operations performed on a gradient enabled tensor and creates an acyclic graph called the dynamic computational graph. The leaves of this graph are input tensors and the roots are output tensors. Gradients are calculated by tracing the graph from the root to the leaf and multiplying every gradient in the way using the chain rule.

A Variable class wraps a tensor. You can access this tensor by calling .data attribute of a Variable.

The Variable also stores the gradient of a scalar quantity (say, loss) with respect to the parameter it holds. This gradient can be accessed by calling the .grad attribute. This is basically the gradient computed up to this particular node, and the gradient of the every subsequent node, can be computed by multiplying the edge weight with the gradient computed at the node just before it.

The third attribute a Variable holds is a grad_fn, a Function object which created the variable.

Variable: The Variable, just like a Tensor is a class that is used to hold data. It differs, however, in the way it’s meant to be used. Variables are specifically tailored to hold values which change during training of a neural network, i.e. the learnable paramaters of our network. Tensors on the other hand are used to store values that are not to be learned. For example, a Tensor maybe used to store the values of the loss generated by each example.

Every variable object has several members one of them is grad:

grad: grad holds the value of gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.

Backward() function Backward is the function which actually calculates the gradient by passing it’s argument (1x1 unit tensor by default) through the backward graph all the way up to every leaf node traceable from the calling root tensor. The calculated gradients are then stored in .grad of every leaf node. Remember, the backward graph is already made dynamically during the forward pass. Backward function only calculates the gradient using the already made graph and stores them in leaf nodes. '''


def forward(x):
return x * w


w = Variable(torch.Tensor([1.0]), requires_grad=True)
# On setting .requires_grad = True they start forming a backward graph
# that tracks every operation applied on them to calculate the gradients
# using something called a dynamic computation graph (DCG)
# When you finish your computation you can call .backward() and have
# all the gradients computed automatically. The gradient for this tensor
# will be accumulated into .grad attribute.

# Now create an array of data.
# By PyTorch’s design, gradients can only be calculated
# for floating point tensors which is why I’ve created a float type
# array before making it a gradient enabled PyTorch tensor
x_data = [11.0, 22.0, 33.0]
y_data = [21.0, 14.0, 64.0]


def loss_function(x, y):
y_pred = forward(x)
return (y_pred - y) * (y_pred - y)


# Now running the training loop
for epoch in range(10):
for x_val, y_val in zip(x_data, y_data):
l = loss_function(x_val, y_val)
l.backward()
print("\tgrad: ", x_val, y_val, w.grad.data[0])
w.data = w.data - 0.01 * w.grad

# Manually set the gradient to zero after updating weights
w.grad.data.zero_()

print('progress: ', epoch, l.data[0])