-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new backends with DifferentiationInterface.jl #302
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amontoison for the PR. I have mixed feelings with it. On one side it is progress, on the other side we are loosing Hessian backend for Enzyme and Zygote.
How far are we from making it fully compatible?
We can but only for unconstrained problems. The user will no longer be able to use an incorrect Hessian, which is better for everyone. |
f782802
to
8740d3a
Compare
@gdalle May I ask you to check what I did wrong in the file |
It looks like the problem comes from forgetting to import the function |
@dpo could you perhaps give me acces to the repo so that I may help with this and future PRs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut feeling is that if you want to switch to DI, you have to switch to ADTypes as well and stop doing this cumbersome translation between symbol and backend objects. What do you think?
@@ -1,23 +1,25 @@ | |||
# How to switch backend in ADNLPModels | |||
|
|||
`ADNLPModels` allows the use of different backends to compute the derivatives required within NLPModel API. | |||
It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional depencies. | |||
It uses `ForwardDiff.jl`, `ReverseDiff.jl`, and more via optional dependencies. | |||
|
|||
The backend information is in a structure [`ADNLPModels.ADModelBackend`](@ref) in the attribute `adbackend` of a `ADNLPModel`, it can also be accessed with [`get_adbackend`](@ref). | |||
|
|||
The functions used internally to define the NLPModel API and the possible backends are defined in the following table: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just switch fully to the ADTypes specification? You're gonna run into trouble translating symbols into AbstractADType
objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the symbols don't allow you to set parameters like
- the number of chunks in ForwardDiff
- the tape compilation in ReverseDiff
- aspects of the mode in Enzyme
$\mathcal{L}(x)$ denotes the Lagrangian $f(x) + \lambda^T c(x)$. | ||
Except for the backends based on `ForwardDiff.jl` and `ReverseDiff.jl`, all other backends require the associated AD package to be manually installed by the user to work. | ||
Note that the Jacobians and Hessians computed by the backends above are dense. | ||
The backends `SparseADJacobian`, `SparseADHessian`, and `SparseReverseADHessian` should be used instead if sparse Jacobians and Hessians are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark for sparse AD, using AutoSparse
seems more flexible?
(:ZygoteADGradient , :AutoZygote ), | ||
# (:ForwardDiffADGradient , :AutoForwardDiff ), | ||
# (:ReverseDiffADGradient , :AutoReverseDiff ), | ||
(:MooncakeADGradient , :AutoMooncake ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoMooncake
constructor requires a keyword, like so:
AutoMooncake(; config=nothing)
(:DiffractorADGradient , :AutoDiffractor ), | ||
(:TrackerADGradient , :AutoTracker ), | ||
(:SymbolicsADGradient , :AutoSymbolics ), | ||
(:ChainRulesADGradient , :AutoChainRules ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoChainRules
constructor requires a keyword, like so:
AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig())
(:ChainRulesADGradient , :AutoChainRules ), | ||
(:FastDifferentiationADGradient , :AutoFastDifferentiation ), | ||
(:FiniteDiffADGradient , :AutoFiniteDiff ), | ||
(:FiniteDifferencesADGradient , :AutoFiniteDifferences ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AutoFiniteDifferences
constructor requires a keyword, like so:
AutoFiniteDifferences(; fdm=FiniteDifferences.central_fdm(3, 1))
x0::AbstractVector = rand(nvar), | ||
kwargs..., | ||
) | ||
backend = $fbackend() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail for the three backends mentioned above. And for all other backends, this prevents you from setting any options, which was the goal of ADTypes.jl to begin with: see https://github.com/SciML/ADTypes.jl?tab=readme-ov-file#why-should-ad-users-adopt-this-standard
for (ADHvprod, fbackend) in ((:EnzymeADHvprod , :AutoEnzyme ), | ||
(:ZygoteADHvprod , :AutoZygote ), | ||
# (:ForwardDiffADHvprod , :AutoForwardDiff ), | ||
# (:ReverseDiffADHvprod , :AutoReverseDiff ), | ||
(:MooncakeADHvprod , :AutoMooncake ), | ||
(:DiffractorADHvprod , :AutoDiffractor ), | ||
(:TrackerADHvprod , :AutoTracker ), | ||
(:SymbolicsADHvprod , :AutoSymbolics ), | ||
(:ChainRulesADHvprod , :AutoChainRules ), | ||
(:FastDifferentiationADHvprod , :AutoFastDifferentiation ), | ||
(:FiniteDiffADHvprod , :AutoFiniteDiff ), | ||
(:FiniteDifferencesADHvprod , :AutoFiniteDifferences ), | ||
(:PolyesterForwardDiffADHvprod, :AutoPolyesterForwardDiff)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diffractor, Mooncake, Tracker and ChainRules probably don't work in second order.
FiniteDiff and FiniteDifferences might give you inaccurate results depending on their configuration (JuliaDiff/DifferentiationInterface.jl#78)
end | ||
end | ||
|
||
for (ADHessian, fbackend) in ((:EnzymeADHessian , :AutoEnzyme ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark as for HVP about backends incompatible with second order
ForwardDiff_backend = Dict( | ||
:gradient_backend => ForwardDiffADGradient, | ||
:jprod_backend => ForwardDiffADJprod, | ||
:jtprod_backend => ForwardDiffADJtprod, | ||
:hprod_backend => ForwardDiffADHvprod, | ||
:jacobian_backend => ForwardDiffADJacobian, | ||
:hessian_backend => ForwardDiffADHessian, | ||
:ghjvprod_backend => EmptyADbackend, | ||
:jprod_residual_backend => ForwardDiffADJprod, | ||
:jtprod_residual_backend => ForwardDiffADJtprod, | ||
:hprod_residual_backend => ForwardDiffADHvprod, | ||
:jacobian_residual_backend => ForwardDiffADJacobian, | ||
:hessian_residual_backend => ForwardDiffADHessian | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of DifferentiationInterface is to save a lot people a lot code. However, this PR ends up adding more lines than it removes, precisely because of this kind of disjunction.
@VaibhavDixit2 how did you handle the choice of backend for each operator in OptimizationBase.jl?
@@ -0,0 +1,12 @@ | |||
using OptimizationProblems | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using NLPModels |
@gdalle I invited you. Thank you for your work here!!! |
@amontoison what do you think about moving away from symbols here? |
It depends on the alternatives, Right now, it's useful to specify that we want optimized backends with It will be easier to provide an |
If I'm not mistaken there are two levels here:
Right now you base all of the internal representations on |
Do you have an example of what you suggest? |
I could try to show you in an alternative PR |
Okay it is a bit hard to submit a PR since there would be a lot of things to rewrite and I don't understand what each part does. But essentially I was imagining something like this: using ADTypes
using DifferentiationInterface
using LinearAlgebra
using SparseMatrixColorings
using SparseConnectivityTracer
import ForwardDiff, ReverseDiff
function DefaultAutoSparse(backend::AbstractADType)
return AutoSparse(
backend;
sparsity_detector=TracerSparsityDetector(),
coloring_algorithm=GreedyColoringAlgorithm(),
)
end
struct ADModelBackend
gradient_backend
hprod_backend
jprod_backend
jtprod_backend
jacobian_backend
hessian_backend
end
struct ADModelBackendPrep
gradient_prep
hprod_prep
jprod_prep
jtprod_prep
jacobian_prep
hessian_prep
end
function ADModelBackend(forward_backend::AbstractADType, reverse_backend::AbstractADType)
@assert ADTypes.mode(forward_backend) isa
Union{ADTypes.ForwardMode,ADTypes.ForwardOrReverseMode}
@assert ADTypes.mode(reverse_backend) isa
Union{ADTypes.ReverseMode,ADTypes.ForwardOrReverseMode}
gradient_backend = reverse_backend
hprod_backend = SecondOrder(forward_backend, reverse_backend)
jprod_backend = forward_backend
jtprod_backend = reverse_backend
jacobian_backend = DefaultAutoSparse(forward_backend) # or a size-dependent heuristic
hessian_backend = DefaultAutoSparse(SecondOrder(forward_backend, reverse_backend))
return ADModelBackend(
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
)
end
function ADModelBackendPrep(
admodel_backend::ADModelBackend,
obj::Function,
cons::Function,
lag::Function,
x::AbstractVector,
)
(;
gradient_backend,
hprod_backend,
jprod_backend,
jtprod_backend,
jacobian_backend,
hessian_backend,
) = admodel_backend
c = cons(x)
λ = similar(c)
dx = similar(x)
dc = similar(c)
gradient_prep = prepare_gradient(lag, gradient_backend, x, Constant(λ))
hprod_prep = prepare_hvp(lag, hprod_backend, x, (dx,), Constant(λ))
jprod_prep = prepare_pushforward(cons, jprod_backend, x, (dx,))
jtprod_prep = prepare_pullback(cons, jtprod_backend, x, (dc,))
jacobian_prep = prepare_jacobian(cons, jacobian_backend, x)
hessian_prep = prepare_hessian(lag, hessian_backend, x, Constant(λ))
return ADModelBackendPrep(
gradient_prep, hprod_prep, jprod_prep, jtprod_prep, jacobian_prep, hessian_prep
)
end
admodel_backend = ADModelBackend(AutoForwardDiff(), AutoReverseDiff())
obj(x) = sum(x)
cons(x) = abs.(x)
lag(x, λ) = obj(x) + dot(λ, cons(x))
admodel_backend_prep = ADModelBackendPrep(admodel_backend, obj, cons, lag, rand(3)); |
Add the following backends: