Merge in main

compintell · Aug 2, 2024 · f935f55 · f935f55
2 parents f898fed + 7a5c03e
commit f935f55
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 13 deletions.
diff --git a/docs/src/algorithmic_differentiation.md b/docs/src/algorithmic_differentiation.md
@@ -104,7 +104,11 @@ given a function ``f`` which is differentiable at a point ``x``, compute ``D f [
 If ``f : \RR^P \to \RR^Q``, this is equivalent to computing ``J[x] \dot{x}``, where ``J[x]`` is the Jacobian of ``f`` at ``x``.
 For the interested reader we provide a high-level explanation of _how_ forwards-mode AD does this in [_How_ does Forwards-Mode AD work?](@ref).
 
+_**Another aside: notation**_
 
+You will have noticed that we typically denote the argument to a derivative with a "dot" over it, e.g. ``\dot{x}``.
+This is something that we will do consistently, and we will use the same notation for the outputs of derivatives.
+Wherever you see a symbol with a "dot" over it, expect it to be an input or output of a derivative / forwards-mode AD.
 
 # Reverse-Mode AD: _what_ does it do?
 
@@ -140,6 +144,15 @@ We may occassionally write it as ``(D f [x])^\ast`` if there is some risk of con
 
 We will explain _how_ reverse-mode AD goes about computing this after some worked examples.
 
+_**Aside: Notation**_
+
+You will have noticed that arguments to adjoints have thus far always had a "bar" over them, e.g. ``\bar{y}``.
+This notation is common in the AD literature and will be used throughout.
+Additionally, this "bar" notation will be used for the outputs of adjoints of derivatives.
+So wherever you see a symbol with a "bar" over it, think "input or output of adjoint of derivative".
+
+
+
 ### Some Worked Examples
 
 We now present some worked examples in order to prime intuition, and to introduce the important classes of problems that will be encountered when doing AD in the Julia language.
@@ -317,7 +330,7 @@ _**Step 2: Compute Derivative**_
 
 The derivative of ``\phi_{\text{f!}}`` is
 ```math
-D \phi_{\text{f!}} [x](\dot{x}) = (2 x \odot x, 2 \sum_{n=1}^N x_n \dot{x}_n).
+D \phi_{\text{f!}} [x](\dot{x}) = (2 x \odot \dot{x}, 2 \sum_{n=1}^N x_n \dot{x}_n).
 ```
 
 _**Step 3: Compute Adjoint of Derivative**_
@@ -444,7 +457,7 @@ Subsequent sections will build on these foundations, to provide a more general e
 
 ### _How_ does Forwards-Mode AD work?
 
-Forwards-mode AD achieves this by breaking down ``f`` into the composition ``f = f_N \circ \dots \circ f_1``, # where each ``f_n`` is a simple function whose derivative (function) ``D f_n [x_n]`` we know for any given ``x_n``. By the chain rule, we have that
+Forwards-mode AD achieves this by breaking down ``f`` into the composition ``f = f_N \circ \dots \circ f_1``, where each ``f_n`` is a simple function whose derivative (function) ``D f_n [x_n]`` we know for any given ``x_n``. By the chain rule, we have that
 ```math
 D f [x] (\dot{x}) = D f_N [x_N] \circ \dots \circ D f_1 [x_1] (\dot{x})
 ```
@@ -455,7 +468,7 @@ which suggests the following algorithm:
 4. let ``n = n + 1``
 5. if ``n = N+1`` then return ``\dot{x}_{N+1}``, otherwise go to 2.
 
-When each function ``f_n`` maps between Euclidean spaces, the applications of derivatives ``D f_n [x_n] (\dot{x}_n)`` are given by ``J_n \dot{x}_n`` where ``J_n`` is the Jacobian of ``f_n`` at ``x_n``.v
+When each function ``f_n`` maps between Euclidean spaces, the applications of derivatives ``D f_n [x_n] (\dot{x}_n)`` are given by ``J_n \dot{x}_n`` where ``J_n`` is the Jacobian of ``f_n`` at ``x_n``.
 
 ```@bibliography
 ```

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,8 +1,8 @@
 # Tapir.jl
 
-Documentation for Tapir.jl is on it's way!
+Documentation for Tapir.jl is on its way!
 
-Note 02/07/2024: The first round of documentation has arrived.
+Note (02/07/2024): The first round of documentation has arrived.
 This is largely targetted at those who are interested in contributing to Tapir.jl -- you can find this work in the "Understanding Tapir.jl" section of the docs.
 There is more to to do, but it should be sufficient to understand how AD works in principle, and the core abstractions underlying Tapir.jl.
 

diff --git a/docs/src/mathematical_interpretation.md b/docs/src/mathematical_interpretation.md
@@ -101,24 +101,28 @@ is perfectly fine.
 It is helpful to have a concrete example which uses both of the permissible methods to make results externally visible.
 To this end, consider the following `function`:
 ```julia
-function f(x::Vector{Float64}, y::Vector{Float64}, z::Vector{Float64})
-    z .= y .* x
+function f(x::Vector{Float64}, y::Vector{Float64}, z::Vector{Float64}, s::Ref{Vector{Float64}})
+    z .*= y .* x
+    s[] = 2z
     return sum(z)
 end
 ```
 We draw your attention to two features of this `function`:
-1. `z` is mutated, and
-2. we allocate a new value and return it (albeit, it is probably allocated on the stack).
+1. `z` is mutated,
+2. `s` is mutated to contain freshly allocated memory, and
+3. we allocate a new value and return it (albeit, it is probably allocated on the stack).
 
 The model we adopt for `function` `f` is a function ``f : \mathcal{X} \to \mathcal{X} \times \mathcal{A}`` where ``\mathcal{X}`` is the real finite Hilbert space associated to the arguments to `f` prior to execution, and ``\mathcal{A}`` is the real finite Hilbert space associated to any newly allocated data during execution which is externally visible after execution -- any newly allocated data which is not made visible is of no concern.
-In this example, ``\mathcal{X} = \RR^D \times \RR^D \times \RR^D`` where ``D`` is the length of `x` / `y` / `z`, and ``\mathcal{A} = \RR``.
+In this example, ``\mathcal{X} = \RR^D \times \RR^D \times \RR^D \times \RR^S`` where ``D`` is the length of `x` / `y` / `z`, and ``S`` the length of `s[]` prior to running `f`.
+``\mathcal{A} = \RR^D \times \RR``, where the ``\RR^D`` component corresponds to the value put in `s`, and ``\RR`` to the return value.
+In this example, some of the memory allocated during execution is made externally visible by modifying one of the arguments, not just via the return value.
 
 The argument to ``f`` is the arguments to `f` _before_ execution, and the output is the 2-tuple comprising the same arguments _after_ execution and the values associated to any newly allocated / created data.
 Crucially, observe that we distinguish between the state of the arguments before and after execution.
 
 For our example, the exact form of ``f`` is
 ```math
-f((x, y, z)) = ((x, y, x \odot y), \sum_{d=1}^D x \odot y)
+f((x, y, z)) = ((x, y, x \odot y), (2 x \odot y, \sum_{d=1}^D x \odot y))
 ```
 Observe that ``f`` behaves a little like a transition operator, in the that the first element of the tuple returned is the updated state of the arguments.
 
@@ -202,6 +206,10 @@ The rule returns another `CoDual` (it propagates book-keeping information forwar
 In a little more depth:
 
 
+_**Notation: primal**_
+
+Throughout the rest of this document, we will refer to the `function` being differentiated as the "primal" computation, and its arguments as the "primal" arguments.
+
 ### Forwards Pass
 
 _**Inputs**_
@@ -251,8 +259,8 @@ In order to address these, we need to discuss the types that Tapir.jl uses to re
 
 # Representing Gradients
 
-We call the argument or output of a derivative ``D f [x] : \mathcal{X} \to \mathcal{Y}`` a _tangent_, and will usually denote it with a dot over a symbol, e.g. ``\dot{x}``.
-Conversely, we call an argument or output of the adjoint of this derivative ``D f [x]^\ast : \mathcal{Y} \to \mathcal{X}`` a _gradient_, and will usually denote it with a bar over a symbol, e.g. ``\bar{y}``.
+We refer to both inputs and outputs of derivatives ``D f [x] : \mathcal{X} \to \mathcal{Y}`` as _tangents_, e.g. ``\dot{x}`` or ``\dot{y}``.
+Conversely, we refer to both inputs and outputs to the adjoint of this derivative ``D f [x]^\ast : \mathcal{Y} \to \mathcal{X}`` as _gradients_, e.g. ``\bar{y}`` and ``\bar{x}``.
 
 Note, however, that the sets involved are the same whether dealing with a derivative or its adjoint.
 Consequently, we use the same type to represent both.
@@ -365,6 +373,7 @@ The second that we always assume that the components of ``\bar{y}_x`` which are
 
 The third is that the components of the arguments of `f` which are identified by their value must have rdata passed back explicitly by a rule, while the components of the arguments to `f` which are identified by their address get their gradients propagated back implicitly (i.e. via the in-place modification of fdata).
 
+_**Reminder**_: the first element of the tuple returned by `dfoo_adjoint` is the rdata associated to `foo` itself, hence it is `NoRData`.
 
 # Testing
 
@@ -403,3 +412,42 @@ There are a few notable reasons:
 This topic, in particular what goes wrong with permissive tangent type systems like those employed by ChainRules, deserves a more thorough treatment -- hopefully someone will write something more expansive on this topic at some point.
 
 
+### Why Support Closures But Not Mutable Globals
+
+First consider why closures are straightforward to support.
+Look at the type of the closure produced by `foo`:
+```jldoctest
+function foo(x)
+    function bar(y)
+        x .+= y
+        return nothing
+    end
+    return bar
+end
+bar = foo(randn(5))
+typeof(bar)
+
+# output
+var"#bar#1"{Vector{Float64}}
+```
+Observe that the `Vector{Float64}` that we passed to `foo`, and closed over in `bar`, is present in the type.
+This alludes to the fact that closures are basically just callable `struct`s whose fields are the closed-over variables.
+Since the function itself is an argument to its rule, everything enters the rule for `bar` via its arguments, and the rule system developed in this document applies straightforwardly.
+
+On the other hand, globals do not appear in the functions that they are a part of.
+For example,
+```jldoctest
+const a = randn(10)
+
+function g(x)
+    a .+= x
+    return nothing
+end
+
+typeof(g)
+
+# output
+typeof(g) (singleton type of function g, subtype of Function)
+```
+Neither the value nor type of `a` are present in `g`.
+Since `a` doesn't enter `g` via its arguments, it is unclear how it should be handled in general.