Tools for testing identitites #1671

martinmodrak · 2020-02-04T16:17:10Z

Description

Using known identities is a powerful way to test precision of our functions. Currently using identities for derivatives is a bit tedious - you need to explicitly compute derivative against each variable on both left and right hand side of the identity and compare them. The proposal is to create a function that does this for you.

Example

The goal is to be able to write something like:

auto lh = [](const var& a, const var& b) {
  return stan::math::lbeta(a, b);
}
// Alternatively: auto lh = static_cast<var(*)(const var&, const var&)>(lbeta);
auto rh = [](const var& a, const var& b) {
  return stan::math::log_sum_exp(lbeta(a + 1, b), lbeta(a, b + 1));
};

for(double x : x_to_test) {
  for(double y : y_to_test) {
    expect_identity("succesors", lh, rh, tolerance, x, y); 
  }
}

I would not expect expect_identity to check this for all possible instantiations, I believe there should be a separate test that all instantiations are numerically equal and after that we should just test with all var to make things easier.

Also note while I could test that lh - rh == 0 (with zero gradients all over) this is potentially problematic as the test tolerance could not reflect the magnitude of the actual values.

Expected output

It tests that the value and all derivates are close up to a relative tolerance using expect_near_rel

Current Version:

v3.1.0

The text was updated successfully, but these errors were encountered:

bob-carpenter · 2020-02-04T23:49:33Z

This would be a great feature. I've thought about doing this in the past but have always been too lazy. I don't see how this could work in general, because how would we know what arguments were legal and what shape? It'd be easy to set it up the same way we set up expect_ad so that you supply arguments. It could be done with parameter packs for generality: template <typename F, typename G, typename... Ts> expect_identity(const ad_tolerances& tols, const F& lh, const G& rh, Ts... args); I didn't understand the comment about just testing with all var. I'd like to test all levels of autodiff.

…

On Feb 4, 2020, at 11:17 AM, Martin Modrák ***@***.***> wrote: Description Using known identities is a powerful way to test precision of our functions. Currently using identities for derivatives is a bit tedious - you need to explicitly compute derivative against each variable on both left and right hand side of the identity and compare them. The proposal is to create a function that does this for you. Example The goal is to be able to write something like: auto lh = [](const var& a, const var& b) { return stan::math::lbeta(a, b); } // Alternatively: auto lh = static_cast<var(*)(const var&, const var&)>(lbeta); auto rh = [](const var& a, const var& b) { return stan::math::log_sum_exp(lbeta(a + 1, b), lbeta(a, b + 1)); }; expect_identity("succesors", lh, rh, tolerance); I would not expect expect_identity to check this for all possible instantiations, I believe there should be a separate test that all instantiations are numerically equal and after that we should just test with all var to make things easier. Also note while I could test that lh - rh == 0 (with zero gradients all over) this is potentially problematic as the test tolerance could not reflect the magnitude of the actual values. Expected output It tests that the value and all derivates are close up to a relative tolerance using expect_near_rel Current Version: v3.1.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

martinmodrak · 2020-02-05T11:07:34Z

@bob-carpenter you are correct, the arguments have to be provided, I updated the description.

I didn't understand the comment about just testing with all var. I'd like to test all levels of autodiff.

Sorry for being unclear, there are IMHO two things to consider:

Limiting to reverse mode, should we run the same test for lbeta(double, double), lbeta(var, double), lbeta(double, var) and lbeta(var,var)? I think this is not very useful. Instead of having multiple testing procedures (test_ad, this one) each going through complex templating to invoke all the variants, I think we should instead separately test that all those variants give equal results (this test also IMHO can and should be really strict, e.g. use EXPECT_FLOAT_EQ) and then perform all the tests for identities / finite diff only on the lbeta(var,var). Similarly for forward/mix mode.
Should we run the same test for lbeta(var,var), lbeta(fvar,fvar) and lbeta(fvar<var>, fvar<var>)? This one I am less sure about, but I feel slightly opposed to it. AFAIK the forward and mixed mode are less developed and I would suspect strong tests that the reverse mode would pass could be problematic for forward/mixed mode. Also not sure if all functions that are in reverse mode work with fwd/mix. So in the current state it would IMHO make sense to have separate tests for expect_identity_rev, expect_identity_fwd, expect_identity_mix and invoke those separately in separate tests whose stringency would be tailored to the individual autodiff levels. But I may be wrong about this - I've never really worked with fwd or mix.

Not sure if this is the best place for this discussion - should I move it to Discourse?

martinmodrak · 2020-02-05T15:23:07Z

I wrote a prototype implementation for the var only case, which was reasonably easy to do (see the PR #1677)

bob-carpenter · 2020-02-05T16:04:27Z

My preference would be to do both 1 and 2, because the implementations of all these cases can vary. If everything was just templated prim code, then we should be able to get away with only testing primitive inputs.

• Limiting to reverse mode, should we run the same test for lbeta(double, double), lbeta(var, double), lbeta(double, var) and lbeta(var,var)? I think this is not very useful. Instead of having multiple testing procedures (test_ad, this one) each going through complex templating to invoke all the variants, I think we should instead separately test that all those variants give equal results (this test also IMHO can and should be really strict, e.g. use EXPECT_FLOAT_EQ) and then perform all the tests for identities / finite diff only on the lbeta(var,var). Similarly for forward/mix mode. • Should we run the same test for lbeta(var,var), lbeta(fvar,fvar) and lbeta(fvar<var>, fvar<var>)? This one I am less sure about, but I feel slightly opposed to it. AFAIK the forward and mixed mode are less developed and I would suspect strong tests that the reverse mode would pass could be problematic for forward/mixed mode. So in the current state it would IMHO make sense to have separate tests for expect_identity_rev, expect_identity_fwd, expect_identity_mix and invoke those separately in separate tests whose stringency would be tailored to the individual autodiff levels. Not sure if this is the best place for this discussion - should I move it to Discourse?

Either way. It's focused enough on this issue that I'm OK keeping the discussion here.

martinmodrak · 2020-02-05T19:17:39Z

Sorry for not being clear, what I meant is that I think we should separate individual testing concerns (functional identities vs. consistency between instantiations). So a typical test would look like:

for(double x : x_to_test) {
  for(double y : y_to_test) {
    expect_instantiations_equal(lbeta, x, y);
    expect_identity("succesors", lh, rh, tolerance, x, y); 
  }
}

Limiting ourselves to just reverse mode for now expect_instantiations_equal would check that values and matching gradients are equal for lbeta(var,var), lbeta(double, var), etc. expect_identity would then run only for the (var,var) case. This can be a much stronger test than making expect_identity test the identity for all instantiations because expect_identity will need to use expect_near_rel, possibly with quite a large tolerance while expect_instantiations_equal could use EXPECT_FLOAT_EQ because the results should be bit by bit equal or almost equal. Also mimicking the template magic that is currently expect_ad (or packaging it for reuse) would be a non-trivial endeavour.

I am a bit less sure whether it makes sense to have expect_identity always test all AD modes (but only with all-variable instantiations). This would effectively move all such tests to mix folder (which is probably not a big deal) and would force everyone writing new tests to worry about all modes, which might be overwhelming. But I also clearly see a benefit of forcing fwd and mix mode tests, so not sure.

Building expect_instantiations_equal would be a separate issue so that is not intended to do here, I am just trying to justify the choice to simplify expect_identity to only care about the instantiation with all parameters being variables (of either mode).

Hope I am making sense.

bob-carpenter · 2020-02-05T20:32:31Z

If it's only the complexity of templates that's leading to a proposal to test only foo(var, var) and not foo(var, double) or foo(double, var) for identities, I'm more than happy to help with templating.

The problem with just doing foo(var, var) is that it doesn't say anything about how foo(var, double) is going to behave if they have different implementations. So while I agree that testing is good and any new test is better than nothing, it seems to me like we should go all the way and test all instantations at all levels of autodiff. I'm pretty sure two symbolic implementations with autodiff will be closer than finite diffs.

@syclik---do you have an opinion on this?

martinmodrak · 2020-02-05T21:35:33Z

It's that the added complexity buys us almost nothing. In all codepaths I have seen in rev, the results and respective gradients of foo(var, var), foo(double, var) and foo(var, double) should be bitwise equal, or at worst should satisfy EXPECT_DOUBLE_EQ (which means at most 4 ULP difference).

If you believe this assumption is wrong, than my argument is invalid. But if it is correct than after we tested that the instantiations are equal, it follows that any reasonable test satisfied by foo(var, var) will also be satisfied by the other instantiations and we gain nothing by retesting them. The only reason it could fail is that the right hand side of the identity changes between instantiations, but this IMHO should not happen (in reasonable code) if all its constituents also pass the equal instantiations test.

Also I don't think @syclik was tagged correctly in the previous post so retagging.

bob-carpenter · 2020-02-10T19:58:05Z

On Feb 5, 2020, at 4:35 PM, Martin Modrák ***@***.***> wrote: It's that the added complexity buys us almost nothing. In all codepaths I have seen in rev, the results and respective gradients of foo(var, var), foo(double, var) and foo(var, double) should be bitwise equal, or at worst should satisfy EXPECT_DOUBLE_EQ (which means at most 4 ULP difference). If you believe this assumption is wrong, than my argument is invalid.

For many functions, foo(var, var) will have a completely separate implementation from foo(double, var). For example, addition and multiplication use two completely separate vari implementations for the (var, var) signature vs. (var, double) or (double, var). Because division can't exploit symmetry, there are three separate implementations. If all that is available is a templated prim implementation, we should be able to get by with only checking values at the primitive level and that everything else compiles.

martinmodrak · 2020-02-10T20:36:01Z

I think I see where we are missing each other, so at the risk of being a bit annoying, I'll try to rephrase once more - by "foo(var, var) and foo(double, var) should be bitwise equal" I do not mean that this is guaranteed by implementation. I mean "we want them to be bitwise equal" AND "a test that fails if they are not bitwise equal would not be overly strict".

In that sense the current way of testing in expect_ad is very weak because it tests with relative tolerances 1e-4 and larger against a finite diff value. So currently, if say foo(var, double) gradient is 10.001, first element of foo(var, var) gradient is 9.999 and finite diff is 10, all test pass. This to me seems undesirable. We should IMHO test stronger than that (either bitwise equal or EXPECT_DOUBLE_EQ for corresponding values across all instantiations), different instantiation should not be allowed to have non-negligible effects on computation and if they do we should IMHO change the way they are implemented.

If such tests are (can be) satisfied, there is IMHO little need to have other tests try all instantiations.

martinmodrak · 2020-02-10T20:55:30Z

On second thought I am obviously making this discussion too abstract. It will probably serve all of us better if I polish my thoughts, draw a diagram or two and put it in a longer Discourse post. Thanks for your time with this so far.

bob-carpenter · 2020-02-11T14:33:51Z

On Feb 10, 2020, at 3:36 PM, Martin Modrák ***@***.***> wrote: I think I see where we are missing each other, so at the risk of being a bit annoying, I'll try to rephrase once more - by "foo(var, var) and foo(double, var) should be bitwise equal" I do not mean that this is guaranteed by implementation. I mean "we want them to be bitwise equal" AND "a test that fails if they are not bitwise equal would not be overly strict".

That's right. We can't assume they'll be bitwise equivalent in either value or in derivative.

In that sense the current way of testing in expect_ad is very weak because it tests with relative tolerances 1e-4 and larger against a finite diff value. So currently, if say foo(var, double) gradient is 10.001, first element of foo(var, var) gradient is 9.999 and finite diff is 10, all test pass. This to me seems undesirable.

I agree that these tests are weak, but then it seems very unlikely that these tests are going to pass when the Hessian fails. As far as I know, we don't have a single Hessian specialization for fvar<var>. One way to make these Hessian tests tighter would be to finite diff gradients rather than doubly finite differencing.

We should IMHO test stronger than that (either bitwise equal or EXPECT_DOUBLE_EQ for corresponding values across all instantiations), different instantiation should not be allowed to have non-negligible effects on computation and if they do we should IMHO change the way they are implemented.

I don't understand what you're suggesting here. If you're suggesting that our foo(var, double) and foo(double, double) and foo(var, var) produce bitwise equvalent results, that seems way too strict given that floating point is floating point, not proper real-valued arithmetic.

If such tests are (can be) satisfied, there is IMHO little need to have other tests try all instantiations.

What's the scope of "such tests" here. Can you list the tests you think are sufficient?

martinmodrak mentioned this issue Feb 5, 2020

[WIP] Tools for testing identities #1677

Closed

5 tasks

mcol linked a pull request Feb 7, 2020 that will close this issue

[WIP] Tools for testing identities #1677

Closed

5 tasks

martinmodrak mentioned this issue Mar 12, 2020

Improved numerical stability of binomial_coefficient_log #1614

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tools for testing identitites #1671

Tools for testing identitites #1671

martinmodrak commented Feb 4, 2020 •

edited

Loading

bob-carpenter commented Feb 4, 2020 via email

martinmodrak commented Feb 5, 2020 •

edited

Loading

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 5, 2020 via email

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 5, 2020

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 10, 2020 via email

martinmodrak commented Feb 10, 2020

martinmodrak commented Feb 10, 2020

bob-carpenter commented Feb 11, 2020 via email

Tools for testing identitites #1671

Tools for testing identitites #1671

Comments

martinmodrak commented Feb 4, 2020 • edited Loading

Description

Example

Expected output

Current Version:

bob-carpenter commented Feb 4, 2020 via email

martinmodrak commented Feb 5, 2020 • edited Loading

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 5, 2020 via email

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 5, 2020

martinmodrak commented Feb 5, 2020

bob-carpenter commented Feb 10, 2020 via email

martinmodrak commented Feb 10, 2020

martinmodrak commented Feb 10, 2020

bob-carpenter commented Feb 11, 2020 via email

martinmodrak commented Feb 4, 2020 •

edited

Loading

martinmodrak commented Feb 5, 2020 •

edited

Loading