Explicitly ignore derivatives of argument checks #1492

devmotion · 2022-01-23T02:25:47Z

@willtebbutt noticed in JuliaGaussianProcesses/AbstractGPs.jl#256 (comment) that the seemingly simple argument checks cause a massive slowdown in AD, specifically with Zygote. Hence this PR ignores derivatives of such checks explicitly.

@willtebbutt's example with Normal shows that with this PR AD is very performant, there is basically no overhead compared with the primal and zero allocations, whereas on master it is roughly 60 times slower:

On master:

julia> using BenchmarkTools, Distributions, Zygote

julia> @benchmark Normal(randn(), rand() + 1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  7.032 ns … 42.774 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.482 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.749 ns ±  0.827 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▄▅▇▇█▆▃▁
  ▁▂▂▄▆█████████▇▅▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▂▂▂▂▂▂▁▁ ▃
  7.03 ns        Histogram: frequency by time        9.44 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Zygote._pullback($(Zygote.Context()), Normal, randn(), rand() + 1)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min … max):  435.172 ns … 85.840 μs  ┊ GC (min … max):  0.00% … 98.94%
 Time  (median):     454.141 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   628.774 ns ±  2.690 μs  ┊ GC (mean ± σ):  25.31% ±  5.91%

  ▃██▇▇▅▄▄▃▃▂▂▂▂▁▂▁▁▂▂▁                                        ▂
  ████████████████████████▇▆▅▆▅▆▆▆▆▆▆▇▄▅▆▆▄▄▁▅▄▆▄▃▅▄▅▆▅▄▃▄▄▄▄▄ █
  435 ns        Histogram: log(frequency) by time       797 ns <

 Memory estimate: 848 bytes, allocs estimate: 17.

This PR:

julia> using BenchmarkTools, Distributions, Zygote

julia> @benchmark Normal(randn(), rand() + 1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  6.989 ns … 40.634 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.468 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.734 ns ±  0.801 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▃▆▇█▆▄▂
  ▁▁▁▂▄▆█████████▆▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▃▃▂▂▂▂▁▁▁ ▃
  6.99 ns        Histogram: frequency by time         9.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Zygote._pullback($(Zygote.Context()), Normal, randn(), rand() + 1)
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  7.010 ns … 40.456 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     7.466 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.718 ns ±  0.789 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▃▆███▆▃▁
  ▁▁▂▃▅█████████▆▅▃▃▂▂▂▁▂▁▁▁▂▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁ ▃
  7.01 ns        Histogram: frequency by time        9.45 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

codecov-commenter · 2022-01-23T02:42:44Z

Codecov Report

Merging #1492 (e15030a) into master (02bcbf8) will decrease coverage by 0.20%.
The diff coverage is 76.72%.

@@            Coverage Diff             @@
##           master    #1492      +/-   ##
==========================================
- Coverage   84.44%   84.24%   -0.21%     
==========================================
  Files         124      124              
  Lines        7522     7513       -9     
==========================================
- Hits         6352     6329      -23     
- Misses       1170     1184      +14

Impacted Files	Coverage Δ
src/utils.jl	`64.40% <12.50%> (-35.60%)`	⬇️
src/univariate/continuous/beta.jl	`70.96% <25.00%> (+0.56%)`	⬆️
src/univariate/continuous/erlang.jl	`67.64% <33.33%> (+1.93%)`	⬆️
src/multivariate/multinomial.jl	`85.03% <50.00%> (+1.37%)`	⬆️
src/cholesky/lkjcholesky.jl	`100.00% <100.00%> (ø)`
src/edgeworth.jl	`96.49% <100.00%> (+0.06%)`	⬆️
src/matrix/lkj.jl	`99.17% <100.00%> (-0.03%)`	⬇️
src/multivariate/dirichlet.jl	`72.81% <100.00%> (+0.48%)`	⬆️
src/univariate/continuous/arcsine.jl	`88.88% <100.00%> (ø)`
src/univariate/continuous/betaprime.jl	`93.18% <100.00%> (ø)`
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16d4091...e15030a. Read the comment docs.

sethaxen · 2022-01-23T08:10:40Z

I'm not a big fan of how much ChainRules-specific code is now being included in these functions. What if the @check_args macro included ChainRulesCore.ignore_derivatives internally? Then you would only need to call it for cases where a constructor doesn't use @check_args. Also, what if @check_args was generalized to support multiple conditions and custom error messages, which would I think allow it to be used everywhere @check_args currently isn't used?

devmotion · 2022-01-23T08:30:44Z

What if the @check_args macro included ChainRulesCore.ignore_derivatives internally? Then you would only need to call it for cases where a constructor doesn't use @check_args.

This was my initial approach but unfortunately it is not sufficient if one does not ignore the whole blocks of checks, including if check_args.

devmotion · 2022-01-23T08:50:44Z

Also, what if @check_args was generalized to support multiple conditions and custom error messages, which would I think allow it to be used everywhere @check_args currently isn't used?

I considered this as well but it felt like if it becomes completely general and flexible it would become easier at some point to not use a macro but just implement the checks explicitly. It's also not only used in constructors and IIRC there's at least one case where some variables are unpacked/defined in the check_args block. And it felt a bit weird to include if check_args in the macro which would be necessary as well.

So basically my impressoon was that the macro should be able to generate something like

ChainRulesCore.ignore_derivatives() do
    if some_var # or only if check_args
        some_expr
        cond1 || throw(ArgumentError(cond1_msg))
        cond2 || ...
    end
end

and it was not immediately clear if and what a short but descriptive syntax could look that would provide a major advantage over implementing the block directly.

Do you have an idea how it could look?

mschauer · 2022-01-23T08:52:41Z

I'm not a big fan of how much ChainRules-specific code is now being included in these functions.

This looks like fixing a problem in the wrong place.
This would now work for AD, but every other nonstandard evaluation besides AD (particles, uncertainty intervals, things we don’t anticipate) has to deal with ChainRules specific code.

devmotion · 2022-01-23T09:34:10Z

It is exactly designed for such a use case: https://juliadiff.org/ChainRulesCore.jl/dev/rule_author/tips_for_packages.html#Ignoring-gradients-for-certain-expressions And I don't see why/how it should interact with other non-AD packages, it just executes the function in the primal: https://github.com/JuliaDiff/ChainRulesCore.jl/blob/699e61f1539fdba362fff5a1b438fbccf32370f0/src/ignore_derivatives.jl#L26

devmotion · 2022-01-23T09:42:25Z

The benchmarks also show that there is no overhead for non-AD use. And the AD performance issues on master are quite problematic for downstream applications but can't be addressed properly by them (ie. without type piracy - and in some cases not even this is possible since the check is performed in the inner constructor).

sethaxen · 2022-01-23T15:28:12Z

@devmotion I can't think of a nice syntax for the macro, and your points are all valid.

I guess another approach would be to define a _maybe_check_args(::Type{<:Distribution}, check::Bool, ps...) function where all checks are performed for each distribution (each with their own overload). Then mark it globally as ChainRulesCore.@non_differentiable. But I suspect that would only work for ChainRules, since for operator-overloading ADs, one would need to dispatch on the param values, so there would be unresolved ambiguity.

devmotion · 2022-01-23T17:12:01Z

I thought about moving the checks to a separate function but this seemed unnecessarily complicated - for ignore_derivatives one already defines separate distribution specific functions anyway, without adding more methods to the module and hence without potential ambiguity issues.

To summarize, I think it's correct to address these performance issues in Distributions but it would be good to do it in a more developer friendly and "less special" way. I don't think it's an AD backend issue since I don't think it can be expected in general that AD neglects these checks - without us telling it that it should.

I thought about a more convenient macro. Maybe we could use something like @check_args(Beta, a > zero(a), (b > zero(b)) => "b should be positive!") . This would not include any additional expressions but maybe that's fine for now (even though it's a bit annoying that AD won't ignore them if they are outside of the function...).

devmotion · 2022-01-24T00:12:05Z

I modified the @check_args macro, inspired by ChainRulesCore.@scalar_rule:

    @check_args(
        D,
        @setup(statements...),
        (cond₁, message₁),
        (cond₂, message₂),
        ...,
    )

A convenience macro that generates AD-compatible checks of arguments for a distribution of
type `D`.

More concretely, it generates the following Julia code:
```julia
ChainRulesCore.ignore_derivatives() do
    if check_args
        \$(statements...)
        cond₁ || throw(ArgumentError(\$(string(D, ": ", message₁))))
        cond₂ || throw(ArgumentError(\$(string(D, ": ", message₂))))
        ...
    end
end
```

The `@setup` argument can be elided if no setup code is needed. Moreover, error messages
can be omitted. In this case the message `"the condition \$(cond) is not satisfied."` is
used.

I wonder though if it is too surprising that one has to define a boolean variable check_args. Maybe its name should be passed in the macro call, i.e., something like @check_args(check_args, D, ...)? On the other hand, this would feel a bit redundant.

src/multivariate/dirichlet.jl

mschauer · 2022-01-24T08:51:22Z

Why not just having a dedicated function

check_arguments(checkargs, f::Function) = checkargs && f()
ChainRulesCore.@non_differentiable check_arguments(checkargs, f)

This is positively saying what happens ("checking arguments") instead of negatively "can't differentiate the following code").

devmotion · 2022-01-24T09:03:20Z

Sure, this could be done but it seems equivalent to what ignore_derivatives does. In the current state of the PR ignore_derivatives only shows up in the code of the macro - and similarly the call to a check_args function would only show up there if we don't want to remove the macro and implement all checks explicitly.

mschauer · 2022-01-24T09:32:32Z

We would own check_arguments so on the path of standard evaluation we would not call chainrules code at all and we are free to add any methods

mschauer · 2022-01-24T09:53:01Z

We can thus depend on a hypothetical UncertaintyCore.jl and define

UncertaintyCore.@deterministic check_arguments(checkargs, f)

if that is required.

devmotion · 2022-01-24T10:13:11Z

I don't see a problem with calling ChainRules - it's documented and guarenteed to not affect the primal computation. Its sole purpose is to not have to define separate functions and mark them as non-differentiable if you want to mark parts of a function body as non-differentiable.

Can you explain your example? What is @deterministic supposed to do? In any case if the function you proposed would be owned by Distributions it would be impossible to extend it in other packages - it would be type piracy and if the functions would be defined locally, eg with a do block, they would not even be available for dispatch. I expect similar problems if another package would define such a function and we would like to extend it.

mschauer · 2022-01-24T11:43:28Z

It is a hypothetical example. There is not only automatic differentiation, but also other things we want to do in an automatic fashion in Julia, for example automatic uncertainty propagation etc.
Now we have seen that declaring the argument check ChainRulesCore.@non_differentiable is necessary for automatic differentiability performance, so we should anticipate the need to declare the argument check also to be ignored for other use cases in other automatic frameworks. For example that it adds no uncertainty in uncertainty propagation or whatever else might coming up later.

devmotion · 2022-01-24T12:04:46Z

we should anticipate the need to declare the argument check also to be ignored for other use cases in other automatic frameworks. For example that it adds no uncertainty in uncertainty propagation or whatever else might coming up later.

Ah, OK, so in your example @deterministic would be similar to @non_differentiable. I thought you wanted to extend the function in some other package or make it an extension of a function in another package, which would both be impossible I think.

I don't have a strong preference, it's simple to switch from one approach to the other since they are only called in the macro. If the consensus is to use a custom function, I can just copy the implementation of ignore_derivatives and use it instead 🤷‍♂️

sethaxen · 2022-01-25T02:55:00Z

I don't have a strong opinion wrt function vs macro, but from a design perspective I do prefer both options over calling ChainRules.ignore_derivative directly, and this way those contributing new distributions can just use our internal function/macro without even needing to think about ChainRules and how ADs handle argument checking.

src/utils.jl

devmotion · 2022-01-25T21:04:30Z

I tried to incorporate all comments, the PR should be ready for a proper review.

devmotion · 2022-01-28T21:12:31Z

@mschauer @sethaxen I would appreciate if you could review this PR and check if I managed to address your comments and suggestions.

src/utils.jl

sethaxen · 2022-01-28T21:19:46Z

test/matrixvariates.jl

@@ -413,7 +413,7 @@ end
 function test_special(dist::Type{LKJ})
    @testset "LKJ mode" begin
        @test mode(LKJ(5, 1.5)) == mean(LKJ(5, 1.5))
-        @test_throws ArgumentError mode( LKJ(5, 0.5) )
+        @test_throws DomainError mode( LKJ(5, 0.5) )


Is this change considered breaking?

No, not according to ColPrac: https://colprac.sciml.ai/#changes-that-are-not-considered-breaking

sethaxen

LGTM!

devmotion · 2022-01-31T09:54:26Z

@mschauer are you happy with the approach in this PR? I added a custom non-differentiable function and generalized the macro.

Explicitly ignore derivatives of argument checks

12a4586

Update @check_args (inspired by @scalar_rule)

bc1dddd

devmotion commented Jan 24, 2022

View reviewed changes

src/multivariate/dirichlet.jl Outdated Show resolved Hide resolved

devmotion and others added 2 commits January 24, 2022 01:14

Fix Dirichlet

666666a

Fix error

252f595

devmotion mentioned this pull request Jan 24, 2022

Remove dynamic call in check_args macro #1491

Closed

devmotion added 2 commits January 25, 2022 01:10

Improve macro

b26ba7b

Fix tests

ece524d

devmotion added 3 commits January 25, 2022 15:09

Fix test of Multinomial

16a8f68

Fix LKJ test error

3af2953

Fix LKJCholesky test errors

57fd670

devmotion commented Jan 25, 2022

View reviewed changes

src/utils.jl Show resolved Hide resolved

devmotion commented Jan 25, 2022

View reviewed changes

src/utils.jl Show resolved Hide resolved

devmotion added 3 commits January 25, 2022 16:16

Fix tests of DiscreteNonParametric

1131fcc

Fix SkewNormal test error

46efd7e

Fix SkewedExponentialPower test

218f4be

sethaxen reviewed Jan 28, 2022

View reviewed changes

Update docstring according to review

08fc714

sethaxen approved these changes Jan 28, 2022

View reviewed changes

devmotion added 2 commits January 29, 2022 09:58

Update Project.toml

854fc83

Merge branch 'master' into dw/args_ignore_derivatives

e15030a

mschauer merged commit f6c5a70 into master Jan 31, 2022

devmotion mentioned this pull request Feb 1, 2022

AD Allocations JuliaGaussianProcesses/AbstractGPs.jl#256

Closed

devmotion deleted the dw/args_ignore_derivatives branch February 9, 2022 17:19

devmotion mentioned this pull request Apr 18, 2023

Add check_args and drop 1.3 JuliaGaussianProcesses/KernelFunctions.jl#499

Open

devmotion mentioned this pull request Nov 21, 2023

Unify PDMat and PDSparseMat + move SparseArrays support to an extension JuliaStats/PDMats.jl#188

Open

Explicitly ignore derivatives of argument checks #1492

Explicitly ignore derivatives of argument checks #1492

Uh oh!

Conversation

devmotion commented Jan 23, 2022

Uh oh!

codecov-commenter commented Jan 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sethaxen commented Jan 23, 2022

Uh oh!

devmotion commented Jan 23, 2022

Uh oh!

devmotion commented Jan 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mschauer commented Jan 23, 2022

Uh oh!

devmotion commented Jan 23, 2022

Uh oh!

devmotion commented Jan 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen commented Jan 23, 2022

Uh oh!

devmotion commented Jan 23, 2022

Uh oh!

devmotion commented Jan 24, 2022

Uh oh!

Uh oh!

mschauer commented Jan 24, 2022

Uh oh!

devmotion commented Jan 24, 2022

Uh oh!

mschauer commented Jan 24, 2022

Uh oh!

mschauer commented Jan 24, 2022

Uh oh!

devmotion commented Jan 24, 2022

Uh oh!

mschauer commented Jan 24, 2022

Uh oh!

devmotion commented Jan 24, 2022

Uh oh!

sethaxen commented Jan 25, 2022

Uh oh!

Uh oh!

Uh oh!

devmotion commented Jan 25, 2022

Uh oh!

devmotion commented Jan 28, 2022

Uh oh!

Uh oh!

Uh oh!

sethaxen Jan 28, 2022

Choose a reason for hiding this comment

Uh oh!

devmotion Jan 28, 2022

Choose a reason for hiding this comment

Uh oh!

sethaxen left a comment

Choose a reason for hiding this comment

Uh oh!

devmotion commented Jan 31, 2022

Uh oh!

Uh oh!

codecov-commenter commented Jan 23, 2022 •

edited

Loading

devmotion commented Jan 23, 2022 •

edited

Loading

devmotion commented Jan 23, 2022 •

edited

Loading