Add new backends with DifferentiationInterface.jl #302

amontoison · 2024-09-11T05:31:55Z

Add the following backends:

Enzyme
Zygote
Mooncake
Diffractor
Tracker
Symbolics
ChainRules
FastDifferentiation
FiniteDiff
FiniteDifferences
PolyesterForwardDiff

docs/src/backend.md

github-actions · 2024-09-11T06:38:01Z

Package name	latest	stable
CaNNOLeS.jl
DCISolver.jl
DerivativeFreeSolvers.jl
JSOSolvers.jl
NLPModelsIpopt.jl
OptimalControl.jl
OptimizationProblems.jl
Percival.jl
QuadraticModels.jl
SolverBenchmark.jl
SolverTools.jl

tmigot

Thanks @amontoison for the PR. I have mixed feelings with it. On one side it is progress, on the other side we are loosing Hessian backend for Enzyme and Zygote.

How far are we from making it fully compatible?

Project.toml

docs/src/backend.md

src/di.jl

test/runtests.jl

test/script_OP.jl

amontoison · 2024-09-13T05:08:07Z

It's only one, so basically with this change we would no longer be able to use Hessian for Enzyme and Zygote.

We can but only for unconstrained problems.
I wanted to remove what was not working before.

The user will no longer be able to use an incorrect Hessian, which is better for everyone.

amontoison · 2024-09-26T02:43:29Z

@gdalle May I ask you to check what I did wrong in the file di.jl?
I have different errors with buildkite: https://buildkite.com/julialang/adnlpmodels-dot-jl/builds/243

gdalle · 2024-09-26T05:42:11Z

It looks like the problem comes from forgetting to import the function grad? Not a DI thing, presumably an NLPModels thing

gdalle · 2024-09-26T06:04:40Z

@dpo could you perhaps give me acces to the repo so that I may help with this and future PRs?

gdalle

My gut feeling is that if you want to switch to DI, you have to switch to ADTypes as well and stop doing this cumbersome translation between symbol and backend objects. What do you think?

gdalle · 2024-09-26T06:08:29Z

docs/src/backend.md

@@ -1,23 +1,25 @@
 # How to switch backend in ADNLPModels

 `ADNLPModels` allows the use of different backends to compute the derivatives required within NLPModel API.
-It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional depencies.
+It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional dependencies.

 The backend information is in a structure [`ADNLPModels.ADModelBackend`](@ref) in the attribute `adbackend` of a `ADNLPModel`, it can also be accessed with [`get_adbackend`](@ref).

 The functions used internally to define the NLPModel API and the possible backends are defined in the following table:


Why not just switch fully to the ADTypes specification? You're gonna run into trouble translating symbols into AbstractADType objects

And the symbols don't allow you to set parameters like

the number of chunks in ForwardDiff

the tape compilation in ReverseDiff

aspects of the mode in Enzyme

gdalle · 2024-09-26T06:09:50Z

docs/src/backend.md

+$\mathcal{L}(x)$ denotes the Lagrangian $f(x) + \lambda^T c(x)$.
+Except for the backends based on `ForwardDiff.jl` and `ReverseDiff.jl`, all other backends require the associated AD package to be manually installed by the user to work.
+Note that the Jacobians and Hessians computed by the backends above are dense.
+The backends `SparseADJacobian`, `SparseADHessian`, and `SparseReverseADHessian` should be used instead if sparse Jacobians and Hessians are required.


Same remark for sparse AD, using AutoSparse seems more flexible?

gdalle · 2024-09-26T06:11:20Z

src/di.jl

+                               (:ZygoteADGradient              , :AutoZygote              ),
+                             # (:ForwardDiffADGradient         , :AutoForwardDiff         ),
+                             # (:ReverseDiffADGradient         , :AutoReverseDiff         ),
+                               (:MooncakeADGradient            , :AutoMooncake            ),


The AutoMooncake constructor requires a keyword, like so:

AutoMooncake(; config=nothing)

gdalle · 2024-09-26T06:11:45Z

src/di.jl

+                               (:DiffractorADGradient          , :AutoDiffractor          ),
+                               (:TrackerADGradient             , :AutoTracker             ),
+                               (:SymbolicsADGradient           , :AutoSymbolics           ),
+                               (:ChainRulesADGradient          , :AutoChainRules          ),


The AutoChainRules constructor requires a keyword, like so:

AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig())

gdalle · 2024-09-26T06:13:38Z

src/di.jl

+                               (:ChainRulesADGradient          , :AutoChainRules          ),
+                               (:FastDifferentiationADGradient , :AutoFastDifferentiation ),
+                               (:FiniteDiffADGradient          , :AutoFiniteDiff          ),
+                               (:FiniteDifferencesADGradient   , :AutoFiniteDifferences   ),


The AutoFiniteDifferences constructor requires a keyword, like so:

AutoFiniteDifferences(; fdm=FiniteDifferences.central_fdm(3, 1))

gdalle · 2024-09-26T06:14:22Z

src/di.jl

+      x0::AbstractVector = rand(nvar),
+      kwargs...,
+    )
+      backend = $fbackend()


This will fail for the three backends mentioned above. And for all other backends, this prevents you from setting any options, which was the goal of ADTypes.jl to begin with: see https://github.com/SciML/ADTypes.jl?tab=readme-ov-file#why-should-ad-users-adopt-this-standard

gdalle · 2024-09-26T06:17:47Z

src/di.jl

+for (ADHvprod, fbackend) in ((:EnzymeADHvprod              , :AutoEnzyme              ),
+                             (:ZygoteADHvprod              , :AutoZygote              ),
+                           # (:ForwardDiffADHvprod         , :AutoForwardDiff         ),
+                           # (:ReverseDiffADHvprod         , :AutoReverseDiff         ),
+                             (:MooncakeADHvprod            , :AutoMooncake            ),
+                             (:DiffractorADHvprod          , :AutoDiffractor          ),
+                             (:TrackerADHvprod             , :AutoTracker             ),
+                             (:SymbolicsADHvprod           , :AutoSymbolics           ),
+                             (:ChainRulesADHvprod          , :AutoChainRules          ),
+                             (:FastDifferentiationADHvprod , :AutoFastDifferentiation ),
+                             (:FiniteDiffADHvprod          , :AutoFiniteDiff          ),
+                             (:FiniteDifferencesADHvprod   , :AutoFiniteDifferences   ),
+                             (:PolyesterForwardDiffADHvprod, :AutoPolyesterForwardDiff))


Diffractor, Mooncake, Tracker and ChainRules probably don't work in second order.
FiniteDiff and FiniteDifferences might give you inaccurate results depending on their configuration (JuliaDiff/DifferentiationInterface.jl#78)

gdalle · 2024-09-26T06:18:07Z

src/di.jl

+  end
+end
+
+for (ADHessian, fbackend) in ((:EnzymeADHessian              , :AutoEnzyme              ),


Same remark as for HVP about backends incompatible with second order

gdalle · 2024-09-26T06:20:24Z

src/predefined_backend.jl

+ForwardDiff_backend = Dict(
+  :gradient_backend => ForwardDiffADGradient,
+  :jprod_backend => ForwardDiffADJprod,
+  :jtprod_backend => ForwardDiffADJtprod,
+  :hprod_backend => ForwardDiffADHvprod,
+  :jacobian_backend => ForwardDiffADJacobian,
+  :hessian_backend => ForwardDiffADHessian,
+  :ghjvprod_backend => EmptyADbackend,
+  :jprod_residual_backend => ForwardDiffADJprod,
+  :jtprod_residual_backend => ForwardDiffADJtprod,
+  :hprod_residual_backend => ForwardDiffADHvprod,
+  :jacobian_residual_backend => ForwardDiffADJacobian,
+  :hessian_residual_backend => ForwardDiffADHessian
+)


The goal of DifferentiationInterface is to save a lot people a lot code. However, this PR ends up adding more lines than it removes, precisely because of this kind of disjunction.
@VaibhavDixit2 how did you handle the choice of backend for each operator in OptimizationBase.jl?

gdalle · 2024-09-26T06:21:20Z

test/backend/utils.jl

@@ -0,0 +1,12 @@
+using OptimizationProblems
+


Suggested change

using NLPModels

dpo · 2024-09-26T14:11:31Z

@gdalle I invited you. Thank you for your work here!!!

gdalle · 2024-10-01T09:59:26Z

@amontoison what do you think about moving away from symbols here?

amontoison · 2024-10-01T15:24:53Z

@amontoison what do you think about moving away from symbols here?

It depends on the alternatives, Right now, it's useful to specify that we want optimized backends with :optimized or only matrix-free backends :matrix_free (no Jacobian or Hessian).
But if Enzyme.jl is stable enough, we could drop :optimized and use a boolean for matrix-free backends.

It will be easier to provide an AutoBackend() with the appropriate options.

gdalle · 2024-10-01T15:28:17Z

If I'm not mistaken there are two levels here:

the interface you present to the user (:optimized, :matrix_free)
the way you represent the backends internally

Right now you base all of the internal representations on Symbols. But as explained here, the whole reason for ADTypes was to move beyond Symbols towards full-fledged types that are 1) more expressive and 2) dispatchable. That's why I was suggesting a similar move here. It doesn't stop you from offering :optimized autodiff options in the front end if you like

amontoison · 2024-10-01T15:32:51Z

Do you have an example of what you suggest?

gdalle · 2024-10-01T15:35:54Z

I could try to show you in an alternative PR

gdalle · 2024-10-01T16:38:16Z

Okay it is a bit hard to submit a PR since there would be a lot of things to rewrite and I don't understand what each part does. But essentially I was imagining something like this:

using ADTypes
using DifferentiationInterface
using LinearAlgebra
using SparseMatrixColorings
using SparseConnectivityTracer
import ForwardDiff, ReverseDiff

function DefaultAutoSparse(backend::AbstractADType)
    return AutoSparse(
        backend;
        sparsity_detector=TracerSparsityDetector(),
        coloring_algorithm=GreedyColoringAlgorithm(),
    )
end

struct ADModelBackend
    gradient_backend
    hprod_backend
    jprod_backend
    jtprod_backend
    jacobian_backend
    hessian_backend
end

struct ADModelBackendPrep
    gradient_prep
    hprod_prep
    jprod_prep
    jtprod_prep
    jacobian_prep
    hessian_prep
end

function ADModelBackend(forward_backend::AbstractADType, reverse_backend::AbstractADType)
    @assert ADTypes.mode(forward_backend) isa
        Union{ADTypes.ForwardMode,ADTypes.ForwardOrReverseMode}
    @assert ADTypes.mode(reverse_backend) isa
        Union{ADTypes.ReverseMode,ADTypes.ForwardOrReverseMode}

    gradient_backend = reverse_backend
    hprod_backend = SecondOrder(forward_backend, reverse_backend)
    jprod_backend = forward_backend
    jtprod_backend = reverse_backend
    jacobian_backend = DefaultAutoSparse(forward_backend)  # or a size-dependent heuristic
    hessian_backend = DefaultAutoSparse(SecondOrder(forward_backend, reverse_backend))

    return ADModelBackend(
        gradient_backend,
        hprod_backend,
        jprod_backend,
        jtprod_backend,
        jacobian_backend,
        hessian_backend,
    )
end

function ADModelBackendPrep(
    admodel_backend::ADModelBackend,
    obj::Function,
    cons::Function,
    lag::Function,
    x::AbstractVector,
)
    (;
        gradient_backend,
        hprod_backend,
        jprod_backend,
        jtprod_backend,
        jacobian_backend,
        hessian_backend,
    ) = admodel_backend

    c = cons(x)
    λ = similar(c)

    dx = similar(x)
    dc = similar(c)

    gradient_prep = prepare_gradient(lag, gradient_backend, x, Constant(λ))
    hprod_prep = prepare_hvp(lag, hprod_backend, x, (dx,), Constant(λ))
    jprod_prep = prepare_pushforward(cons, jprod_backend, x, (dx,))
    jtprod_prep = prepare_pullback(cons, jtprod_backend, x, (dc,))
    jacobian_prep = prepare_jacobian(cons, jacobian_backend, x)
    hessian_prep = prepare_hessian(lag, hessian_backend, x, Constant(λ))

    return ADModelBackendPrep(
        gradient_prep, hprod_prep, jprod_prep, jtprod_prep, jacobian_prep, hessian_prep
    )
end

admodel_backend = ADModelBackend(AutoForwardDiff(), AutoReverseDiff())

obj(x) = sum(x)
cons(x) = abs.(x)
lag(x, λ) = obj(x) + dot(λ, cons(x))

admodel_backend_prep = ADModelBackendPrep(admodel_backend, obj, cons, lag, rand(3));

amontoison requested a review from tmigot September 11, 2024 05:31

amontoison commented Sep 11, 2024

View reviewed changes

docs/src/backend.md Show resolved Hide resolved

amontoison force-pushed the di branch from bdc15d1 to b66bafa Compare September 11, 2024 06:29

tmigot requested changes Sep 12, 2024

View reviewed changes

amontoison force-pushed the di branch from 1dba798 to 3d33ce2 Compare September 25, 2024 20:47

amontoison changed the title ~~Add more backends for Zygote and Enzyme~~ Add new backends with DifferentiationInterface.jl Sep 25, 2024

amontoison force-pushed the di branch from 3d33ce2 to d94d15e Compare September 25, 2024 21:38

JuliaSmoothOptimizers deleted a comment from tmigot Sep 26, 2024

amontoison force-pushed the di branch 2 times, most recently from f782802 to 8740d3a Compare September 26, 2024 02:37

Add new backends with DifferentiationInterface.jl

8ad8221

amontoison force-pushed the di branch from 8740d3a to 8ad8221 Compare September 26, 2024 02:40

[documentation] Update predefined.md

ef82374

gdalle requested changes Sep 26, 2024

View reviewed changes

gdalle mentioned this pull request Oct 1, 2024

Pb with zero in dyn rhs? control-toolbox/CTDirect.jl#190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new backends with DifferentiationInterface.jl #302

Add new backends with DifferentiationInterface.jl #302

amontoison commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024 •

edited

Loading

tmigot left a comment

amontoison commented Sep 13, 2024 •

edited

Loading

amontoison commented Sep 26, 2024

gdalle commented Sep 26, 2024 •

edited

Loading

gdalle commented Sep 26, 2024 •

edited

Loading

gdalle left a comment •

edited

Loading

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

gdalle Sep 26, 2024

dpo commented Sep 26, 2024

gdalle commented Oct 1, 2024

amontoison commented Oct 1, 2024 •

edited

Loading

gdalle commented Oct 1, 2024

amontoison commented Oct 1, 2024

gdalle commented Oct 1, 2024

gdalle commented Oct 1, 2024 •

edited

Loading

Add new backends with DifferentiationInterface.jl #302

Are you sure you want to change the base?

Add new backends with DifferentiationInterface.jl #302

Conversation

amontoison commented Sep 11, 2024 • edited Loading

github-actions bot commented Sep 11, 2024 • edited Loading

tmigot left a comment

Choose a reason for hiding this comment

amontoison commented Sep 13, 2024 • edited Loading

amontoison commented Sep 26, 2024

gdalle commented Sep 26, 2024 • edited Loading

gdalle commented Sep 26, 2024 • edited Loading

gdalle left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dpo commented Sep 26, 2024

gdalle commented Oct 1, 2024

amontoison commented Oct 1, 2024 • edited Loading

gdalle commented Oct 1, 2024

amontoison commented Oct 1, 2024

gdalle commented Oct 1, 2024

gdalle commented Oct 1, 2024 • edited Loading

amontoison commented Sep 11, 2024 •

edited

Loading

github-actions bot commented Sep 11, 2024 •

edited

Loading

amontoison commented Sep 13, 2024 •

edited

Loading

gdalle commented Sep 26, 2024 •

edited

Loading

gdalle commented Sep 26, 2024 •

edited

Loading

gdalle left a comment •

edited

Loading

amontoison commented Oct 1, 2024 •

edited

Loading

gdalle commented Oct 1, 2024 •

edited

Loading