Skip to content

Commit

Permalink
tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
mcabbott committed Nov 27, 2022
1 parent 93a1a96 commit 89074bc
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 8 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ MacroTools = "0.5"
NNlib = "0.8.9"
NNlibCUDA = "0.2.4"
OneHotArrays = "0.1, 0.2"
Optimisers = "0.2.10"
Optimisers = "0.2.11"
ProgressLogging = "0.1"
Reexport = "0.2, 1.0"
SpecialFunctions = "1.8.2, 2.1.2"
Expand Down
18 changes: 16 additions & 2 deletions docs/src/training/train_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,28 @@ Flux.Optimise.train!(loss, model, data, opt; cb)
To see one in a terminal, you will need to install [TerminalLoggers.jl](https://github.com/JuliaLogging/TerminalLoggers.jl)
and follow its setup instructions.

The new version of Flux's training code was written as an independent package, called Optimisers.jl.
However, at present all Flux models contain parameter arrays (such as `Array`s and `CuArray`s)
The new version of Flux's training code was written as an independent package, [Optimisers.jl](https://github.com/FluxML/Optimisers.jl).
This is designed to allow for immutable objects.
But at present all Flux models contain parameter arrays (such as `Array`s and `CuArray`s)
which can be updated in-place. Thus objects returned by `update!` can be ignored.

```@docs
Optimisers.update!
```

### Modifiers

The state returned by `setup` can be modified to temporarily prevent training of
some parts of the model, or to change the learning rate uses.
The functions for doing so may be accessed as `Flux.freeze!`, `Flux.thaw!`, and `Flux.adjust`:

```@docs
Optimisers.adjust
Optimisers.freeze!
Optimisers.thaw!
```


## Implicit style

Flux used to handle gradients, training, and optimisation rules quite differently.
Expand Down
22 changes: 17 additions & 5 deletions docs/src/training/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,15 +326,27 @@ The first, [`WeightDecay`](@ref) adds `0.42` times original parameter to the gra
matching the gradient of the penalty above (with the same, unrealistically large, constant).
After that, in either case, [`Adam`](@ref) computes the final update.

The same mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref ).
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref ).

Besides L2 / weight decay, another common and quite different kind of regularisation is
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some ... ??

?? do we discuss test/train mode here too?
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the
previous layer during training.
It should switch automatically, but see [trainmode!](@ref Flux.trainmode!) / [testmode!](@ref Flux.testmode!) to manually enable or disable this layer.

## Freezing, Schedules

?? maybe these also fit in here.
Finer control of training

```julia
model = Chain(enc = encoder, dec = decoder)

opt = Flux.setup(Adam(), model)

Flux.freeze!(opt.layers.enc) # corresponds to model.layers.end
```

!!! note
This `freeze!` goes with the "explicit" style.
The earlier "implicit" equivalent was to pass to `gradient` an object referencing only
part of the model, such as `Flux.params(model.layers.enc)`.

1 change: 1 addition & 0 deletions src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ using MacroTools: @forward
@reexport using NNlib
using MLUtils
import Optimisers: Optimisers, trainable, destructure # before v0.13, Flux owned these functions
using Optimisers: freeze!, thaw!, adjust

using Zygote, ChainRulesCore
using Zygote: Params, @adjoint, gradient, pullback, @nograd
Expand Down

0 comments on commit 89074bc

Please sign in to comment.