Skip to content

Commit

Permalink
Merge in main
Browse files Browse the repository at this point in the history
  • Loading branch information
willtebbutt committed Aug 2, 2024
2 parents f898fed + 7a5c03e commit f935f55
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 13 deletions.
19 changes: 16 additions & 3 deletions docs/src/algorithmic_differentiation.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,11 @@ given a function ``f`` which is differentiable at a point ``x``, compute ``D f [
If ``f : \RR^P \to \RR^Q``, this is equivalent to computing ``J[x] \dot{x}``, where ``J[x]`` is the Jacobian of ``f`` at ``x``.
For the interested reader we provide a high-level explanation of _how_ forwards-mode AD does this in [_How_ does Forwards-Mode AD work?](@ref).

_**Another aside: notation**_

You will have noticed that we typically denote the argument to a derivative with a "dot" over it, e.g. ``\dot{x}``.
This is something that we will do consistently, and we will use the same notation for the outputs of derivatives.
Wherever you see a symbol with a "dot" over it, expect it to be an input or output of a derivative / forwards-mode AD.

# Reverse-Mode AD: _what_ does it do?

Expand Down Expand Up @@ -140,6 +144,15 @@ We may occassionally write it as ``(D f [x])^\ast`` if there is some risk of con

We will explain _how_ reverse-mode AD goes about computing this after some worked examples.

_**Aside: Notation**_

You will have noticed that arguments to adjoints have thus far always had a "bar" over them, e.g. ``\bar{y}``.
This notation is common in the AD literature and will be used throughout.
Additionally, this "bar" notation will be used for the outputs of adjoints of derivatives.
So wherever you see a symbol with a "bar" over it, think "input or output of adjoint of derivative".



### Some Worked Examples

We now present some worked examples in order to prime intuition, and to introduce the important classes of problems that will be encountered when doing AD in the Julia language.
Expand Down Expand Up @@ -317,7 +330,7 @@ _**Step 2: Compute Derivative**_

The derivative of ``\phi_{\text{f!}}`` is
```math
D \phi_{\text{f!}} [x](\dot{x}) = (2 x \odot x, 2 \sum_{n=1}^N x_n \dot{x}_n).
D \phi_{\text{f!}} [x](\dot{x}) = (2 x \odot \dot{x}, 2 \sum_{n=1}^N x_n \dot{x}_n).
```

_**Step 3: Compute Adjoint of Derivative**_
Expand Down Expand Up @@ -444,7 +457,7 @@ Subsequent sections will build on these foundations, to provide a more general e

### _How_ does Forwards-Mode AD work?

Forwards-mode AD achieves this by breaking down ``f`` into the composition ``f = f_N \circ \dots \circ f_1``, # where each ``f_n`` is a simple function whose derivative (function) ``D f_n [x_n]`` we know for any given ``x_n``. By the chain rule, we have that
Forwards-mode AD achieves this by breaking down ``f`` into the composition ``f = f_N \circ \dots \circ f_1``, where each ``f_n`` is a simple function whose derivative (function) ``D f_n [x_n]`` we know for any given ``x_n``. By the chain rule, we have that
```math
D f [x] (\dot{x}) = D f_N [x_N] \circ \dots \circ D f_1 [x_1] (\dot{x})
```
Expand All @@ -455,7 +468,7 @@ which suggests the following algorithm:
4. let ``n = n + 1``
5. if ``n = N+1`` then return ``\dot{x}_{N+1}``, otherwise go to 2.

When each function ``f_n`` maps between Euclidean spaces, the applications of derivatives ``D f_n [x_n] (\dot{x}_n)`` are given by ``J_n \dot{x}_n`` where ``J_n`` is the Jacobian of ``f_n`` at ``x_n``.v
When each function ``f_n`` maps between Euclidean spaces, the applications of derivatives ``D f_n [x_n] (\dot{x}_n)`` are given by ``J_n \dot{x}_n`` where ``J_n`` is the Jacobian of ``f_n`` at ``x_n``.

```@bibliography
```
Expand Down
4 changes: 2 additions & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Tapir.jl

Documentation for Tapir.jl is on it's way!
Documentation for Tapir.jl is on its way!

Note 02/07/2024: The first round of documentation has arrived.
Note (02/07/2024): The first round of documentation has arrived.
This is largely targetted at those who are interested in contributing to Tapir.jl -- you can find this work in the "Understanding Tapir.jl" section of the docs.
There is more to to do, but it should be sufficient to understand how AD works in principle, and the core abstractions underlying Tapir.jl.

Expand Down
64 changes: 56 additions & 8 deletions docs/src/mathematical_interpretation.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,24 +101,28 @@ is perfectly fine.
It is helpful to have a concrete example which uses both of the permissible methods to make results externally visible.
To this end, consider the following `function`:
```julia
function f(x::Vector{Float64}, y::Vector{Float64}, z::Vector{Float64})
z .= y .* x
function f(x::Vector{Float64}, y::Vector{Float64}, z::Vector{Float64}, s::Ref{Vector{Float64}})
z .*= y .* x
s[] = 2z
return sum(z)
end
```
We draw your attention to two features of this `function`:
1. `z` is mutated, and
2. we allocate a new value and return it (albeit, it is probably allocated on the stack).
1. `z` is mutated,
2. `s` is mutated to contain freshly allocated memory, and
3. we allocate a new value and return it (albeit, it is probably allocated on the stack).

The model we adopt for `function` `f` is a function ``f : \mathcal{X} \to \mathcal{X} \times \mathcal{A}`` where ``\mathcal{X}`` is the real finite Hilbert space associated to the arguments to `f` prior to execution, and ``\mathcal{A}`` is the real finite Hilbert space associated to any newly allocated data during execution which is externally visible after execution -- any newly allocated data which is not made visible is of no concern.
In this example, ``\mathcal{X} = \RR^D \times \RR^D \times \RR^D`` where ``D`` is the length of `x` / `y` / `z`, and ``\mathcal{A} = \RR``.
In this example, ``\mathcal{X} = \RR^D \times \RR^D \times \RR^D \times \RR^S`` where ``D`` is the length of `x` / `y` / `z`, and ``S`` the length of `s[]` prior to running `f`.
``\mathcal{A} = \RR^D \times \RR``, where the ``\RR^D`` component corresponds to the value put in `s`, and ``\RR`` to the return value.
In this example, some of the memory allocated during execution is made externally visible by modifying one of the arguments, not just via the return value.

The argument to ``f`` is the arguments to `f` _before_ execution, and the output is the 2-tuple comprising the same arguments _after_ execution and the values associated to any newly allocated / created data.
Crucially, observe that we distinguish between the state of the arguments before and after execution.

For our example, the exact form of ``f`` is
```math
f((x, y, z)) = ((x, y, x \odot y), \sum_{d=1}^D x \odot y)
f((x, y, z)) = ((x, y, x \odot y), (2 x \odot y, \sum_{d=1}^D x \odot y))
```
Observe that ``f`` behaves a little like a transition operator, in the that the first element of the tuple returned is the updated state of the arguments.

Expand Down Expand Up @@ -202,6 +206,10 @@ The rule returns another `CoDual` (it propagates book-keeping information forwar
In a little more depth:


_**Notation: primal**_

Throughout the rest of this document, we will refer to the `function` being differentiated as the "primal" computation, and its arguments as the "primal" arguments.

### Forwards Pass

_**Inputs**_
Expand Down Expand Up @@ -251,8 +259,8 @@ In order to address these, we need to discuss the types that Tapir.jl uses to re

# Representing Gradients

We call the argument or output of a derivative ``D f [x] : \mathcal{X} \to \mathcal{Y}`` a _tangent_, and will usually denote it with a dot over a symbol, e.g. ``\dot{x}``.
Conversely, we call an argument or output of the adjoint of this derivative ``D f [x]^\ast : \mathcal{Y} \to \mathcal{X}`` a _gradient_, and will usually denote it with a bar over a symbol, e.g. ``\bar{y}``.
We refer to both inputs and outputs of derivatives ``D f [x] : \mathcal{X} \to \mathcal{Y}`` as _tangents_, e.g. ``\dot{x}`` or ``\dot{y}``.
Conversely, we refer to both inputs and outputs to the adjoint of this derivative ``D f [x]^\ast : \mathcal{Y} \to \mathcal{X}`` as _gradients_, e.g. ``\bar{y}`` and ``\bar{x}``.

Note, however, that the sets involved are the same whether dealing with a derivative or its adjoint.
Consequently, we use the same type to represent both.
Expand Down Expand Up @@ -365,6 +373,7 @@ The second that we always assume that the components of ``\bar{y}_x`` which are

The third is that the components of the arguments of `f` which are identified by their value must have rdata passed back explicitly by a rule, while the components of the arguments to `f` which are identified by their address get their gradients propagated back implicitly (i.e. via the in-place modification of fdata).

_**Reminder**_: the first element of the tuple returned by `dfoo_adjoint` is the rdata associated to `foo` itself, hence it is `NoRData`.

# Testing

Expand Down Expand Up @@ -403,3 +412,42 @@ There are a few notable reasons:
This topic, in particular what goes wrong with permissive tangent type systems like those employed by ChainRules, deserves a more thorough treatment -- hopefully someone will write something more expansive on this topic at some point.


### Why Support Closures But Not Mutable Globals

First consider why closures are straightforward to support.
Look at the type of the closure produced by `foo`:
```jldoctest
function foo(x)
function bar(y)
x .+= y
return nothing
end
return bar
end
bar = foo(randn(5))
typeof(bar)
# output
var"#bar#1"{Vector{Float64}}
```
Observe that the `Vector{Float64}` that we passed to `foo`, and closed over in `bar`, is present in the type.
This alludes to the fact that closures are basically just callable `struct`s whose fields are the closed-over variables.
Since the function itself is an argument to its rule, everything enters the rule for `bar` via its arguments, and the rule system developed in this document applies straightforwardly.

On the other hand, globals do not appear in the functions that they are a part of.
For example,
```jldoctest
const a = randn(10)
function g(x)
a .+= x
return nothing
end
typeof(g)
# output
typeof(g) (singleton type of function g, subtype of Function)
```
Neither the value nor type of `a` are present in `g`.
Since `a` doesn't enter `g` via its arguments, it is unclear how it should be handled in general.

0 comments on commit f935f55

Please sign in to comment.