Problems with Zygote and the `PDMat` constructor #159

simsurace · 2022-05-04T11:41:13Z

An error is thrown when differentiating a trace of a matrix division with a PDMat:

using LinearAlgebra
using PDMats
using Zygote

function kernel(x)
    return [1. x; x 1.]
end

# PDMat is basically a wrapper for a cholesky decomposition.
# However, using `cholesky` explicitly does not throw any errors:
f(x) = tr(cholesky(kernel(0.1)) \ kernel(x))
Zygote.gradient(x->f(only(x)), [.2]) # works

# Trying to perform the same operation through the `PDMat` constructor fails:
g(x) = tr(PDMat(kernel(0.1)) \ kernel(x))
Zygote.gradient(x->g(only(x)), [.2]) # ERROR

ERROR: Need an adjoint for constructor PDMat{Float64, Matrix{Float64}}. Gradient is of type Matrix{Float64}
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] (::Zygote.Jnew{PDMat{Float64, Matrix{Float64}}, Nothing, false})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/lib/lib.jl:324
  [3] (::Zygote.var"#1784#back#228"{Zygote.Jnew{PDMat{Float64, Matrix{Float64}}, Nothing, false}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
  [4] Pullback
    @ ~/.julia/packages/PDMats/ovlmf/src/pdmat.jl:9 [inlined]
  [5] (::typeof(∂(PDMat{Float64, Matrix{Float64}})))(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
  [6] Pullback
    @ ~/.julia/packages/PDMats/ovlmf/src/pdmat.jl:16 [inlined]
  [7] (::typeof(∂(PDMat)))(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
  [8] Pullback
    @ ~/.julia/packages/PDMats/ovlmf/src/pdmat.jl:19 [inlined]
  [9] (::typeof(∂(PDMat)))(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [10] Pullback
    @ ./REPL[7]:1 [inlined]
 [11] (::typeof(∂(f)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [12] Pullback
    @ ./REPL[8]:1 [inlined]
 [13] (::typeof(∂(#3)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [14] (::Zygote.var"#56#57"{typeof(∂(#3))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:41
 [15] gradient(f::Function, args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:76
 [16] top-level scope
    @ REPL[8]:1

This seems like a strange error. I tried to reproduce with my own type, but couldn't:

# The following is basically copy-pasted from master:
struct MyPDMat{T<:Real,S<:AbstractMatrix}
    dim::Int
    mat::S
    chol::Cholesky{T,S}

    MyPDMat{T,S}(d::Int,m::AbstractMatrix{T},c::Cholesky{T,S}) where {T,S} = new{T,S}(d,m,c)
end

function MyPDMat(mat::AbstractMatrix,chol::Cholesky{T,S}) where {T,S}
    d = size(mat, 1)
    size(chol, 1) == d ||
        throw(DimensionMismatch("Dimensions of mat and chol are inconsistent."))
    MyPDMat{T,S}(d, convert(S, mat), chol)
end

MyPDMat(mat::AbstractMatrix) = MyPDMat(mat, cholesky(mat))

Base.:\(a::MyPDMat, x::AbstractVecOrMat) = cholesky(a) \ x
LinearAlgebra.cholesky(a::MyPDMat) = a.chol

h(x) = tr(MyPDMat(kernel(0.1)) \ kernel(x))
Zygote.gradient(x->h(only(x)), [.2]) # works

BTW, all of these functions can be differentiated with ForwardDiff.

The text was updated successfully, but these errors were encountered:

devmotion · 2022-05-04T11:58:44Z

This seems like a strange error.

Doesn't help but I've seen these quite often.

I tried to reproduce with my own type, but couldn't:

A major difference is that your type is not a subtype of AbstractMatrix, and hence defaults for AbstractMatrix in Zygote and ChainRules do not affect it.

devmotion · 2022-05-04T12:02:35Z

The usual approach for fixing these errors is defining an rrule or a projector with CR, as Will discussed in the linked PR.

simsurace · 2022-05-04T13:27:22Z

This is a sure sign that I don't actually understand how Zygote works. Where in the call stack would it matter whether or not MyPDMat is a subtype of AbstractMatrix? Because the only way this has any bearing on the result of h is through cholesky, which has a method for MyPDMat anyway.

EDIT: In other words, the fact that Zygote can differentiate h, and it suddenly can't just because we add extra information through making MyPDMat a subtype of AbstractMatrix, which seems to be irrelevant for the specific function call, is mysterious to me.

simsurace · 2022-05-04T13:39:49Z

Making it a subtype of AbstractMatrix indeed makes it fail:

struct MyOtherPDMat{T<:Real,S<:AbstractMatrix} <: AbstractMatrix{T}
    dim::Int
    mat::S
    chol::Cholesky{T,S}

    MyOtherPDMat{T,S}(d::Int,m::AbstractMatrix{T},c::Cholesky{T,S}) where {T,S} = new{T,S}(d,m,c)
end

function MyOtherPDMat(mat::AbstractMatrix,chol::Cholesky{T,S}) where {T,S}
    d = size(mat, 1)
    size(chol, 1) == d ||
        throw(DimensionMismatch("Dimensions of mat and chol are inconsistent."))
    MyOtherPDMat{T,S}(d, convert(S, mat), chol)
end

MyOtherPDMat(mat::AbstractMatrix) = MyOtherPDMat(mat, cholesky(mat))

Base.:\(a::MyOtherPDMat, x::AbstractVecOrMat) = cholesky(a) \ x
LinearAlgebra.cholesky(a::MyOtherPDMat) = a.chol

h(x) = tr(MyOtherPDMat(kernel(0.1)) \ kernel(x))
Zygote.gradient(x->h(only(x)), [.2]) # ERROR

ERROR: MethodError: no method matching size(::MyOtherPDMat{Float64, Matrix{Float64}})
Closest candidates are:
  size(::AbstractArray{T, N}, ::Any) where {T, N} at ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/base/abstractarray.jl:42
  size(::Union{QR, LinearAlgebra.QRCompactWY, QRPivoted}) at ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:567
  size(::Union{QR, LinearAlgebra.QRCompactWY, QRPivoted}, ::Integer) at ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/LinearAlgebra/src/qr.jl:566
  ...
Stacktrace:
  [1] axes
    @ ./abstractarray.jl:95 [inlined]
  [2] axes(A::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/LinearAlgebra/src/adjtrans.jl:175
  [3] has_offset_axes(A::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}})
    @ Base ./abstractarray.jl:105
  [4] _tuple_any(f::typeof(Base.has_offset_axes), tf::Bool, a::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}}, b::Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ Base ./tuple.jl:516
  [5] _tuple_any(f::Function, t::Tuple{Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}}, Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}}})
    @ Base ./tuple.jl:513
  [6] has_offset_axes(::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}}, ::Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ Base ./abstractarray.jl:107
  [7] require_one_based_indexing(::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}}, ::Vararg{Any})
    @ Base ./abstractarray.jl:110
  [8] \(A::Adjoint{Float64, MyOtherPDMat{Float64, Matrix{Float64}}}, B::Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.7.2+0~x64/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:1129
  [9] (::Zygote.var"#752#753"{MyOtherPDMat{Float64, Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}})(Z̄::Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/lib/array.jl:494
 [10] (::Zygote.var"#3058#back#754"{Zygote.var"#752#753"{MyOtherPDMat{Float64, Matrix{Float64}}, Matrix{Float64}, Matrix{Float64}}})(Δ::Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
 [11] Pullback
    @ ./REPL[21]:1 [inlined]
 [12] (::typeof(∂(h)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [13] Pullback
    @ ./REPL[22]:1 [inlined]
 [14] (::typeof(∂(#7)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [15] (::Zygote.var"#56#57"{typeof(∂(#7))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:41
 [16] gradient(f::Function, args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:76
 [17] top-level scope
    @ REPL[22]:1

simsurace · 2022-05-04T13:53:12Z

The usual approach for fixing these errors is defining an rrule or a projector with CR, as Will discussed in the linked PR.

Which is the function that needs an rrule? Is it

PDMat(mat::AbstractMatrix) = PDMat(mat, cholesky(mat))

?

oxinabox · 2022-05-04T16:56:36Z

If it doesn't happen for your own type that doesn't subtype AbstractMatrix then there should be a solution that involved opting out of some problematic rrule.
https://juliadiff.org/ChainRulesCore.jl/dev/rule_author/superpowers/opt_out.html

simsurace · 2022-05-04T20:14:20Z

Thanks, this sounds great! Any tips on how to find out efficiently which rrule is problematic?

simsurace · 2022-05-05T14:45:48Z

I wasn't able to figure out which rrule to opt out of. However, I came up with an rrule that allows me to differentiate through the rrule interface directly.

I started with det(PDMats(x)) because it is throwing the same error as the more complicated example involving matrix division.

However, Zygote does not seem to recognize my rrule and still complains about a missing adjoint:

using LinearAlgebra
using PDMats
using Zygote

x = [1. 0.2; 0.2 1.]
y = [1. 0.1; 0.1 1.]

Zygote.gradient(logdet ∘ PDMat, x) |> only # works
Zygote.gradient(det ∘ PDMat, x) |> only # ERROR
# ERROR: Need an adjoint for constructor PDMat{Float64, Matrix{Float64}}. Gradient is of type Matrix{Float64}

# Since `logdet` has an overload for `PDMat` and `det` doesn't, 
# and `logdet` above works while `det` doesn't, try to add an overload
# for `det` and see what happens:
LinearAlgebra.det(A::PDMat) = det(A.chol)
Zygote.gradient(det ∘ PDMat, x) |> only
# ERROR: Need an adjoint for constructor PDMat{Float64, Matrix{Float64}}. Gradient is of type Matrix{Float64}

# While the overload may be useful for efficiency, it does not solve the AD issue
# Next, we will try to define an rrule
using ChainRules, ChainRulesCore

# rrule draft for constructor of PDMat
# This is probably not completely correct, but should be a good start
function ChainRulesCore.rrule(::Type{PDMat}, mat)
    chol, chol_pullback = rrule(cholesky, mat)
    y = PDMat(mat, chol)
    function PDMat_pullbackCR(m̄at::AbstractMatrix)
        @info "Using CR for PDMat, AbstractMatrix tangent"
        return NoTangent(), m̄at
    end
    function PDMat_pullbackCR(m̄at::Tangent)
        @info "Using CR for PDMat, Tangent type"
        return NoTangent(), chol_pullback(m̄at.chol)
    end
    return y, PDMat_pullbackCR
end

# Perform the individual forward and backward steps manually:
a, a_pullback = rrule(PDMat, x)
b, b_pullback = rrule(det, a)

b̄ = 1.
_, ā = b_pullback(b̄)
_, x̄ = a_pullback(ā)

# Compare to the result without the `PDMat` wrapper
unthunk(x̄) ≈ Zygote.gradient(det, x) |> only # true
Zygote.gradient(det ∘ PDMat, x) |> only
# ERROR: Need an adjoint for constructor PDMat{Float64, Matrix{Float64}}. Gradient is of type Matrix{Float64}

oxinabox · 2022-05-06T10:13:43Z

You often need to do a Zygote.refresh() to get it to pick up new rrules after it been used once.

theogf · 2022-05-07T20:50:53Z

Just tried this code, and Zygote.gradient(det ∘ PDMat, x) works but then Zygote.gradient(logdet ∘ PDMat, x) fails...

theogf · 2022-05-07T20:59:21Z

Interestingly ForwardDiff also returns the wrong thing if x is not a PDMat already:

julia> ForwardDiff.gradient(det ∘ PDMat, x)
2×2 Matrix{Float64}:
 1.0  -0.4
 0.0   1.0

julia> ForwardDiff.gradient(det, PDMat(x))
2×2 Matrix{Float64}:
  1.0  -0.2
 -0.2   1.0
julia> ForwardDiff.gradient(det, x)
2×2 Matrix{Float64}:
  1.0  -0.2
 -0.2   1.0

devmotion · 2022-05-07T23:16:54Z

I started with det(PDMats(x)) because it is throwing the same error as the more complicated example involving matrix division.

Probably one should restrict rrule(::typeof(det), ...) in the same way as the one for logdet: JuliaDiff/ChainRules.jl#245

Regardless of AD, I think it would be useful to add definitions of det(::AbstractPDMat) = .... since otherwise these calls will fall back to the generic definitions based on the LU decomposition in LinearAlgebra: https://github.com/JuliaLang/julia/blob/6e061322438f13c6548200f115f3c31b20860a30/stdlib/LinearAlgebra/src/generic.jl#L1598-L1604

Probably logdet(::PDMat) does not error because Zygote defines an adjoint for logdet(::Cholesky) (should be moved to ChainRules I guess together with a rule for det(::Cholesky)): https://github.com/FluxML/Zygote.jl/blob/a392eabdc0217f2f34d77ce19d4167c3cd4abbcf/src/lib/array.jl#L744-L748 However, similar to the ForwardDiff example it seems the derivative is wrong since it does not return a Hermitian but a triangular matrix as gradient.

I think the right approach would be to add a projection mechanism for PDMat and rrules for the constructor similar to the one for Hermitian and Symmetric matrices (https://github.com/JuliaDiff/ChainRulesCore.jl/blob/2d75b4be102bb41ba3ac6df6dec8bb9617b20f0f/src/projection.jl#L425-L451 and https://github.com/JuliaDiff/ChainRules.jl/blob/c5dbe030af390599848830ff43a5dffc04be69e2/src/rulesets/LinearAlgebra/symmetric.jl#L5-L92).

simsurace · 2022-05-08T07:59:34Z

Interestingly ForwardDiff also returns the wrong thing if x is not a PDMat already:

I think this would be a great test to do for all the functions overloaded for PDMat. Basically, the wrapper should not change any derivatives wrt. x.

devmotion · 2022-05-08T15:36:57Z

Regardless of AD, I think it would be useful to add definitions of det(::AbstractPDMat) = .... since otherwise these calls will fall back to the generic definitions based on the LU decomposition in LinearAlgebra: https://github.com/JuliaLang/julia/blob/6e061322438f13c6548200f115f3c31b20860a30/stdlib/LinearAlgebra/src/generic.jl#L1598-L1604

I opened #161.

devmotion · 2022-05-09T07:51:55Z

The ForwardDiff issue is not related to PDMats:

julia> using PDMats, ForwardDiff, LinearAlgebra

julia> x = [1. 0.2; 0.2 1.];

julia> ForwardDiff.gradient(det ∘ PDMat, x)
2×2 Matrix{Float64}:
 1.0  -0.4
 0.0   1.0

julia> ForwardDiff.gradient(det ∘ cholesky, x)
2×2 Matrix{Float64}:
 1.0  -0.4
 0.0   1.0

From the perspective of det etc., PDMat does not wrap x but cholesky(x), so the comparison with ForwardDiff.gradient(det, x) is not correct. Thus the incorrect derivative is caused by cholesky and actually expected.

simsurace · 2022-05-09T19:40:23Z

So cholesky is not defined for non-symmetric x, which means that a gradient step will lead to the subspace being left. This is an instance of the gradient being correct as a differential, i.e. for any choice of tangent vector the differential applied to that vector gives the correct result, but it is not a tangent vector itself. Should I open an issue in DiffRules.jl?

simsurace · 2022-05-09T19:58:45Z

Especially because using a Symmetric wrapper fails:

julia> ForwardDiff.gradient(logdet ∘ PDMat, Symmetric(x))
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix

whereas it works for Zygote:

julia> Zygote.gradient(logdet ∘ PDMat, Symmetric(x)) |> only
2×2 Symmetric{Float64, Matrix{Float64}}:
  1.04167   -0.208333
 -0.208333   1.04167

devmotion · 2022-05-09T22:22:44Z

Actually, I don't think there's anything wrong with the derivatives of ForwardDiff and cholesky, the forward-mode sensitivities of cholesky are correct (compare e.g. with https://arxiv.org/abs/1602.07527). It's just that ForwardDiff.gradient(det \circ cholesky, x) is fundamentally different from ForwardDiff.gradient(det, x): the first one assumes that x AND the sensitivities/perturbations/... dx are symmetric matrices whereas the second one does not use any of these assumptions.

devmotion · 2022-05-10T07:48:38Z

Especially because using a Symmetric wrapper fails:

This is a general issue with ForwardDiff.seed!, not limited to Symmetric and not related to cholesky.

simsurace · 2022-05-10T07:58:42Z

Yes, as I said I do believe that the differential is correct because it is only defined for a symmetric tangent. I think we can focus on getting the gradients working in Zygote.

devmotion · 2022-05-10T08:37:16Z

The only things to make it work with Zygote are (copied from above):

Probably one should restrict rrule(::typeof(det), ...) in the same way as the one for logdet: JuliaDiff/ChainRules.jl#245
Zygote defines an adjoint for logdet(::Cholesky) (should be moved to ChainRules I guess together with a rule for det(::Cholesky)): https://github.com/FluxML/Zygote.jl/blob/a392eabdc0217f2f34d77ce19d4167c3cd4abbcf/src/lib/array.jl#L744-L748

I just checked it locally, with these changes also det works. I'll open a PR in ChainRules.

simsurace mentioned this issue May 4, 2022

_prior_kl not differentiable for Centered with Zygote JuliaGaussianProcesses/ApproximateGPs.jl#129

Open

theogf mentioned this issue May 7, 2022

Wrong call on cholesky rrule JuliaDiff/ChainRules.jl#611

Closed

devmotion mentioned this issue May 10, 2022

Add rules for det and logdet of Cholesky JuliaDiff/ChainRules.jl#613

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with Zygote and the `PDMat` constructor #159

Problems with Zygote and the `PDMat` constructor #159

simsurace commented May 4, 2022 •

edited

Loading

devmotion commented May 4, 2022

devmotion commented May 4, 2022

simsurace commented May 4, 2022 •

edited

Loading

simsurace commented May 4, 2022

simsurace commented May 4, 2022

oxinabox commented May 4, 2022

simsurace commented May 4, 2022

simsurace commented May 5, 2022

oxinabox commented May 6, 2022

theogf commented May 7, 2022 •

edited

Loading

theogf commented May 7, 2022 •

edited

Loading

devmotion commented May 7, 2022 •

edited

Loading

simsurace commented May 8, 2022

devmotion commented May 8, 2022

devmotion commented May 9, 2022 •

edited

Loading

simsurace commented May 9, 2022

simsurace commented May 9, 2022

devmotion commented May 9, 2022

devmotion commented May 10, 2022

simsurace commented May 10, 2022

devmotion commented May 10, 2022

Problems with Zygote and the PDMat constructor #159

Problems with Zygote and the PDMat constructor #159

Comments

simsurace commented May 4, 2022 • edited Loading

devmotion commented May 4, 2022

devmotion commented May 4, 2022

simsurace commented May 4, 2022 • edited Loading

simsurace commented May 4, 2022

simsurace commented May 4, 2022

oxinabox commented May 4, 2022

simsurace commented May 4, 2022

simsurace commented May 5, 2022

oxinabox commented May 6, 2022

theogf commented May 7, 2022 • edited Loading

theogf commented May 7, 2022 • edited Loading

devmotion commented May 7, 2022 • edited Loading

simsurace commented May 8, 2022

devmotion commented May 8, 2022

devmotion commented May 9, 2022 • edited Loading

simsurace commented May 9, 2022

simsurace commented May 9, 2022

devmotion commented May 9, 2022

devmotion commented May 10, 2022

simsurace commented May 10, 2022

devmotion commented May 10, 2022

Problems with Zygote and the `PDMat` constructor #159

Problems with Zygote and the `PDMat` constructor #159

simsurace commented May 4, 2022 •

edited

Loading

simsurace commented May 4, 2022 •

edited

Loading

theogf commented May 7, 2022 •

edited

Loading

theogf commented May 7, 2022 •

edited

Loading

devmotion commented May 7, 2022 •

edited

Loading

devmotion commented May 9, 2022 •

edited

Loading