Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs #234

Merged
merged 1 commit into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions DifferentiationInterface/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,15 +72,15 @@ julia> Pkg.add(

```julia
using DifferentiationInterface
import ForwardDiff, Enzyme, Zygote # import automatic differentiation backends you want to use
import ForwardDiff, Enzyme, Zygote # AD backends you want to use

f(x) = sum(abs2, x)

x = [1.0, 2.0, 3.0]
x = [1.0, 2.0]

value_and_gradient(f, AutoForwardDiff(), x) # returns (14.0, [2.0, 4.0, 6.0]) using ForwardDiff.jl
value_and_gradient(f, AutoEnzyme(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Enzyme.jl
value_and_gradient(f, AutoZygote(), x) # returns (14.0, [2.0, 4.0, 6.0]) using Zygote.jl
value_and_gradient(f, AutoForwardDiff(), x) # returns (5.0, [2.0, 4.0]) with ForwardDiff.jl
value_and_gradient(f, AutoEnzyme(), x) # returns (5.0, [2.0, 4.0]) with Enzyme.jl
value_and_gradient(f, AutoZygote(), x) # returns (5.0, [2.0, 4.0]) with Zygote.jl
```

For more performance, take a look at the [DifferentiationInterface tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/tutorial/).
Expand Down
16 changes: 9 additions & 7 deletions DifferentiationInterface/docs/src/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,34 +47,36 @@ backend_table = Markdown.parse(String(take!(io)))

## Types

We support all dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl), as well as their sparse wrapper `AutoSparse`.
We support all dense backend choices from [ADTypes.jl](https://github.com/SciML/ADTypes.jl), as well as their sparse wrapper [`AutoSparse`](@ref).

For sparse backends, only the Jacobian and Hessian operators are implemented differently, the other operators behave the same as for the corresponding dense backend.

```@example backends
backend_table #hide
```

## Availability
## Checks

### Availability

You can use [`check_available`](@ref) to verify whether a given backend is loaded.

## Support for two-argument functions
### Support for two-argument functions

All backends are compatible with one-argument functions `f(x) = y`.
Only some are compatible with two-argument functions `f!(y, x) = nothing`.
You can check this compatibility using [`check_twoarg`](@ref).

## Hessian support
### Support for Hessian

Only some backends are able to compute Hessians.
You can use [`check_hessian`](@ref) to check this feature.
You can use [`check_hessian`](@ref) to check this feature (beware that it will try to compute a small Hessian, so it is not instantaneous).

## API reference

!!! warning
The following documentation has been re-exported from [ADTypes.jl](https://github.com/SciML/ADTypes.jl).
Refer to the ADTypes documentation for more information.
The following documentation has been borrowed from ADTypes.jl.
Refer to the [ADTypes documentation](https://sciml.github.io/ADTypes.jl/stable/) for more information.

```@docs
ADTypes
Expand Down
37 changes: 20 additions & 17 deletions DifferentiationInterface/docs/src/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,16 @@ We provide the following high-level operators:
| [`derivative`](@ref) | 1 | `Number` | `Number` or `AbstractArray` | same as `y` | `size(y)` |
| [`second_derivative`](@ref) | 2 | `Number` | `Number` or `AbstractArray` | same as `y` | `size(y)` |
| [`gradient`](@ref) | 1 | `AbstractArray` | `Number` | same as `x` | `size(x)` |
| [`hvp`](@ref) | 2 | `AbstractArray` | `Number` | same as `x` | `size(x)` |
| [`hessian`](@ref) | 2 | `AbstractArray` | `Number` | `AbstractMatrix` | `(length(x), length(x))` |
| [`jacobian`](@ref) | 1 | `AbstractArray` | `AbstractArray` | `AbstractMatrix` | `(length(y), length(x))` |

They can all be derived from two low-level operators:
They can be derived from lower-level operators:

| operator | order | input `x` | output `y` | result type | result shape |
| :----------------------------- | :---- | :--------- | :----------- | :---------- | :----------- |
| [`pushforward`](@ref) (or JVP) | 1 | `Any` | `Any` | same as `y` | `size(y)` |
| [`pullback`](@ref) (or VJP) | 1 | `Any` | `Any` | same as `x` | `size(x)` |
| operator | order | input `x` | output `y` | seed `v` | result type | result shape |
| :----------------------------- | :---- | :-------------- | :----------- | :------- | :---------- | :----------- |
| [`pushforward`](@ref) (or JVP) | 1 | `Any` | `Any` | `dx` | same as `y` | `size(y)` |
| [`pullback`](@ref) (or VJP) | 1 | `Any` | `Any` | `dy` | same as `x` | `size(x)` |
| [`hvp`](@ref) | 2 | `AbstractArray` | `Number` | `dx` | same as `x` | `size(x)` |

Luckily, most backends have custom implementations, which we reuse if possible instead of relying on fallbacks.

Expand All @@ -36,26 +36,25 @@ Several variants of each operator are defined:
| [`derivative`](@ref) | [`derivative!`](@ref) | [`value_and_derivative`](@ref) | [`value_and_derivative!`](@ref) |
| [`second_derivative`](@ref) | [`second_derivative!`](@ref) | NA | NA |
| [`gradient`](@ref) | [`gradient!`](@ref) | [`value_and_gradient`](@ref) | [`value_and_gradient!`](@ref) |
| [`hvp`](@ref) | [`hvp!`](@ref) | NA | NA |
| [`hessian`](@ref) | [`hessian!`](@ref) | NA | NA |
| [`jacobian`](@ref) | [`jacobian!`](@ref) | [`value_and_jacobian`](@ref) | [`value_and_jacobian!`](@ref) |
| [`pushforward`](@ref) | [`pushforward!`](@ref) | [`value_and_pushforward`](@ref) | [`value_and_pushforward!`](@ref) |
| [`pullback`](@ref) | [`pullback!`](@ref) | [`value_and_pullback`](@ref) | [`value_and_pullback!`](@ref) |
| [`hvp`](@ref) | [`hvp!`](@ref) | NA | NA |

## Mutation and signatures

In order to ensure symmetry between one-argument functions `f(x) = y` and two-argument functions `f!(y, x) = nothing`, we define the same operators for both cases.
However they have different signatures:

| signature | out-of-place | in-place |
| :--------- | :--------------------------------- | :------------------------------------------ |
| `f(x)` | `operator(f, backend, x, ...)` | `operator!(f, result, backend, x, ...)` |
| `f!(y, x)` | `operator(f!, y, backend, x, ...)` | `operator!(f!, y, result, backend, x, ...)` |
| signature | out-of-place | in-place |
| :--------- | :------------------------------------------- | :---------------------------------------------------- |
| `f(x)` | `operator(f, backend, x, [v], [extras])` | `operator!(f, result, backend, x, [v], [extras])` |
| `f!(y, x)` | `operator(f!, y, backend, x, [v], [extras])` | `operator!(f!, y, result, backend, x, [v], [extras])` |

!!! warning
Our mutation convention is that all positional arguments between `f`/`f!` and `backend` are mutated (the `extras` as well, see below).
This convention holds regardless of the bang `!` in the operator name, because we assume that a user passing a two-argument function `f!(y, x)` anticipates mutation anyway.

Still, better be careful with two-argument functions, because every variant of the operator will mutate `y`... even if it does not have a `!` in its name (see the bottom left cell in the table).

## Preparation
Expand All @@ -78,8 +77,8 @@ Unsurprisingly, preparation syntax depends on the number of arguments:

| signature | preparation signature |
| :--------- | :----------------------------------------- |
| `f(x)` | `prepare_operator(f, backend, x, ...)` |
| `f!(y, x)` | `prepare_operator(f!, y, backend, x, ...)` |
| `f(x)` | `prepare_operator(f, backend, x, [v])` |
| `f!(y, x)` | `prepare_operator(f!, y, backend, x, [v])` |

The preparation `prepare_operator(f, backend, x)` will create an object called `extras` containing the necessary information to speed up `operator` and its variants.
This information is specific to `backend` and `f`, as well as the _type and size_ of the input `x` and the _control flow_ within the function, but it should work with different _values_ of `x`.
Expand All @@ -102,6 +101,9 @@ We offer two ways to perform second-order differentiation (for [`second_derivati
At the moment, trial and error is your best friend.
Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.

!!! warning
Preparation does not yet work for the inner differentiation step of a `SecondOrder`, only the outer differentiation is prepared.

## Experimental

!!! danger
Expand All @@ -125,9 +127,10 @@ We make this available for all backends with the following operators:

### Translation

The wrapper [`DifferentiateWith`](@ref) allows you to take a function and specify that it should be differentiated with the backend of your choice.
In other words, when you try to differentiate `dw = DifferentiateWith(f, backend1)` with `backend2`, then `backend1` steps in and `backend2` does nothing.
At the moment it only works when `backend2` supports [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl).
The wrapper [`DifferentiateWith`](@ref) allows you to translate between AD backends.
It takes a function `f` and specifies that `f` should be differentiated with the backend of your choice, instead of whatever other backend the code is trying to use.
In other words, when someone tries to differentiate `dw = DifferentiateWith(f, backend1)` with `backend2`, then `backend1` steps in and `backend2` does nothing.
At the moment, `DifferentiateWith` only works when `backend2` supports [ChainRules.jl](https://github.com/JuliaDiff/ChainRules.jl).

## Going further

Expand Down
61 changes: 26 additions & 35 deletions DifferentiationInterface/docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,8 @@ We present a typical workflow with DifferentiationInterface.jl and showcase its

```@example tuto
using DifferentiationInterface

import ForwardDiff, Enzyme # ⚠️ import the backends you want to use ⚠️
```

!!! tip
Importing backends with `import` instead of `using` avoids name conflicts and makes sure you are using operators from DifferentiationInterface.jl.
This is useful since most backends also export operators like `gradient` and `jacobian`.


## Computing a gradient

A common use case of automatic differentiation (AD) is optimizing real-valued functions with first- or second-order methods.
Expand All @@ -25,21 +18,26 @@ Let's define a simple objective and a random input vector
```@example tuto
f(x) = sum(abs2, x)

x = [1.0, 2.0, 3.0]
nothing # hide
x = collect(1.0:5.0)
```

To compute its gradient, we need to choose a "backend", i.e. an AD package that DifferentiationInterface.jl will call under the hood.
To compute its gradient, we need to choose a "backend", i.e. an AD package to call under the hood.
Most backend types are defined by [ADTypes.jl](https://github.com/SciML/ADTypes.jl) and re-exported by DifferentiationInterface.jl.

[ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) is very generic and efficient for low-dimensional inputs, so it's a good starting point:

```@example tuto
import ForwardDiff

backend = AutoForwardDiff()
nothing # hide
```

Now you can use DifferentiationInterface.jl to get the gradient:
!!! tip
To avoid name conflicts, load AD packages with `import` instead of `using`.
Indeed, most AD packages also export operators like `gradient` and `jacobian`, but you only want to use the ones from DifferentiationInterface.jl.

Now you can use the following syntax to compute the gradient:

```@example tuto
gradient(f, backend, x)
Expand All @@ -48,15 +46,10 @@ gradient(f, backend, x)
Was that fast?
[BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl) helps you answer that question.

```@repl tuto
```@example tuto
using BenchmarkTools
@btime gradient($f, $backend, $x);
```

More or less what you would get if you just used the API from ForwardDiff.jl:

```@repl tuto
@btime ForwardDiff.gradient($f, $x);
@benchmark gradient($f, $backend, $x)
```

Not bad, but you can do better.
Expand All @@ -69,19 +62,18 @@ Some backends get a speed boost from this trick.
```@example tuto
grad = similar(x)
gradient!(f, grad, backend, x)

grad # has been mutated
```

The bang indicates that one of the arguments of `gradient!` might be mutated.
More precisely, our convention is that _every positional argument between the function and the backend is mutated (and the `extras` too, see below)_.

```@repl tuto
@btime gradient!($f, _grad, $backend, $x) evals=1 setup=(_grad=similar($x));
```@example tuto
@benchmark gradient!($f, _grad, $backend, $x) evals=1 setup=(_grad=similar($x))
```

For some reason the in-place version is not much better than your first attempt.
However, it has one less allocation, which corresponds to the gradient vector you provided.
However, it makes fewer allocations, thanks to the gradient vector you provided.
Don't worry, you can get even more performance.

## Preparing for multiple gradients
Expand All @@ -100,31 +92,31 @@ You don't need to know what this object is, you just need to pass it to the grad
```@example tuto
grad = similar(x)
gradient!(f, grad, backend, x, extras)

grad # has been mutated
```

Preparation makes the gradient computation much faster, and (in this case) allocation-free.

```@repl tuto
@btime gradient!($f, _grad, $backend, $x, _extras) evals=1 setup=(
```@example tuto
@benchmark gradient!($f, _grad, $backend, $x, _extras) evals=1 setup=(
_grad=similar($x);
_extras=prepare_gradient($f, $backend, $x)
);
)
```

Beware that the `extras` object is nearly always mutated by differentiation operators, even though it is given as the last positional argument.

## Switching backends

The whole point of DifferentiationInterface.jl is that you can easily experiment with different AD solutions.
Typically, for gradients, reverse mode AD might be a better fit.
So let's try the state-of-the-art [Enzyme.jl](https://github.com/EnzymeAD/Enzyme.jl)!
Typically, for gradients, reverse mode AD might be a better fit, so let's try [ReverseDiff.jl](https://github.com/JuliaDiff/ReverseDiff.jl)!

For this one, the backend definition is slightly more involved, because you need to feed the "mode" to the object from ADTypes.jl:
For this one, the backend definition is slightly more involved, because you can specify whether the tape needs to be compiled:

```@example tuto
backend2 = AutoEnzyme(; mode=Enzyme.Reverse)
import ReverseDiff

backend2 = AutoReverseDiff(; compile=true)
nothing # hide
```

Expand All @@ -134,16 +126,15 @@ But once it is done, things run smoothly with exactly the same syntax:
gradient(f, backend2, x)
```

And you can run the same benchmarks:
And you can run the same benchmarks to see what you gained (although such a small input may not be realistic):

```@repl tuto
@btime gradient!($f, _grad, $backend2, $x, _extras) evals=1 setup=(
```@example tuto
@benchmark gradient!($f, _grad, $backend2, $x, _extras) evals=1 setup=(
_grad=similar($x);
_extras=prepare_gradient($f, $backend2, $x)
);
)
```

Not only is it blazingly fast, you achieved this speedup without looking at the docs of either ForwardDiff.jl or Enzyme.jl!
In short, DifferentiationInterface.jl allows for easy testing and comparison of AD backends.
If you want to go further, check out the [DifferentiationInterfaceTest.jl tutorial](https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterfaceTest/dev/tutorial/).
It provides benchmarking utilities to compare backends and help you select the one that is best suited for your problem.
Loading