-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial implementation of autograd #30
Conversation
@botev @jramapuram @itsnarsi This has been a long time coming, but I'd appreciate if you guys had any feedback as well. |
CC @arrayfire/core-devel |
@Reithan too |
Awesome work @pavanky . Will take a look in more detail when I get to a terminal. Quick question: can you take second derivatives with your implementation? |
@jramapuram Not yet, I wanted to get the first order working first :) |
@jramapuram went ahead and changed the gradients to be Variables too. This should make it easy to perform higher order derivatives. |
@pavanky just tested it on my laptop and it looks pretty neat. Unlike python, I did not see any initial delay. This might be because of no JIT I guess. |
@itsnarsi This is still very nascent. I want to incorporate some of the stuff mentioned here to make it more efficient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, nice job!
examples/FFNet.cpp
Outdated
|
||
using namespace af; | ||
using namespace afml; | ||
using namespace afml::nn; | ||
using namespace af; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a tool for detecting this or a really good eye :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tool would be great. Unfortunately, I'm just an irritating nitpicker. 😇
include/af/autograd/Variable.hpp
Outdated
{ | ||
if (m_grads.size() == 1) return; | ||
Variable grad = m_grads[0]; | ||
for (int i = 1; i < (int)m_grads.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer unsigned iterable to avoid clang's -Wconversion
signedness warnings when indexing to std::vector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do thanks.
Decreased the scope of the PR to get a minimum viable thing going. The additional functions and operators can be added once this PR gets merged. |
- autograd::Variable::Shared now a thin layer without methods - Variable::BackwardFunc_t renamed to Variable::GradFunc_t - Variable::getData renamed to Variable::array - Variable::getGrad renamed to Variable::grad - Variable::backward renamed to Variable::calcGradInputs
@jramapuram I think enabling the support for higher order derivatives by default will increase the memory being used. I am going to enable a flag to enable it during the backward pass. By default only the values will be stored. |
- Disabled by default - can be enabled by passing true as second argument to backward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor preliminary comments. Everything looks great. We can refactor it later as long as we have a clean user-facing API.
|
||
find_package(ArrayFire REQUIRED) | ||
|
||
add_library(afml SHARED "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't add SHARED then you can control the type of library you make by BUILD_SHARED_LIBS variable
Variable operator +(const Variable &lhs, const Variable &rhs) | ||
{ | ||
auto result = lhs.array() + rhs.array(); | ||
auto grad_func = [](std::vector<Variable> &inputs, const Variable &grad_output) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we usually have outputs then inputs?
It looks like you know the # of inputs for each function. I would use something like std::array<Variable, N> for something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these are inputs. grad_output
is an input coming from a different place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And using std::array
is not an option. All functions need to share the same signature so they can be stored as GradFunc_t
inside Variable.
- Implemented baseclass nn::Module - Added basic modules: nn::Linear, nn::Sigmoid, nn:Tanh - Added container modules: nn:Container, nn:Sequential - Deleted unnecessary examples, cleaned up perceptron.cpp
- Trying to solve for the entire batch was a bad idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor issues. This is looking great!
examples/perceptron.cpp
Outdated
|
||
// Update parameters | ||
// TODO: Should use optimizer | ||
for (auto param : perceptron.parameters()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto& ?
examples/perceptron.cpp
Outdated
@@ -0,0 +1,88 @@ | |||
/******************************************************* | |||
* Copyright (c) 2015, ArrayFire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017
include/af/autograd/Variable.hpp
Outdated
GradFunc_t m_grad_func; | ||
}; | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be aligned with other access qualifiers.
src/nn/Modules/Module.cpp
Outdated
@@ -0,0 +1,61 @@ | |||
/******************************************************* | |||
* Copyright (c) 2015, ArrayFire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017
src/nn/Modules/Module.cpp
Outdated
|
||
void Module::eval() | ||
{ | ||
for (auto parameter : m_parameters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto&?
include/af/autograd/Variable.hpp
Outdated
private: | ||
void evalGrad(bool retain_grad_graph = false); | ||
|
||
std::vector<Variable> getInputs() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to return by value?
What is done so far:
autograd::Variable
,autograd::backward
.Variable
af::array
from the uservar.backward(grad_var)
is invoked, it builds a DAG as vector starting with the current variable and propagates gradients down the graph to all the Variables in the graph using the grad function specified at each variable.var.setCalcGrad(false)
Functions
Variable
parameters and returnVariable
as a parameter.Variable
constructed using arguments as parameters:af::array
: The result calculated earliervector<Variable>
: containing the inputs to the functionBackwardFunction_t
: A function pointer to the backward pass. Usually implemented as a lambda function.Example function:
Example:
A simple example showcasing how this can be done currently
TODO: for this PR