Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools for testing identitites #1671

Open
martinmodrak opened this issue Feb 4, 2020 · 11 comments
Open

Tools for testing identitites #1671

martinmodrak opened this issue Feb 4, 2020 · 11 comments

Comments

@martinmodrak
Copy link
Contributor

martinmodrak commented Feb 4, 2020

Description

Using known identities is a powerful way to test precision of our functions. Currently using identities for derivatives is a bit tedious - you need to explicitly compute derivative against each variable on both left and right hand side of the identity and compare them. The proposal is to create a function that does this for you.

Example

The goal is to be able to write something like:

auto lh = [](const var& a, const var& b) {
  return stan::math::lbeta(a, b);
}
// Alternatively: auto lh = static_cast<var(*)(const var&, const var&)>(lbeta);
auto rh = [](const var& a, const var& b) {
  return stan::math::log_sum_exp(lbeta(a + 1, b), lbeta(a, b + 1));
};

for(double x : x_to_test) {
  for(double y : y_to_test) {
    expect_identity("succesors", lh, rh, tolerance, x, y); 
  }
}

I would not expect expect_identity to check this for all possible instantiations, I believe there should be a separate test that all instantiations are numerically equal and after that we should just test with all var to make things easier.

Also note while I could test that lh - rh == 0 (with zero gradients all over) this is potentially problematic as the test tolerance could not reflect the magnitude of the actual values.

Expected output

It tests that the value and all derivates are close up to a relative tolerance using expect_near_rel

Current Version:

v3.1.0

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 4, 2020 via email

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Feb 5, 2020

@bob-carpenter you are correct, the arguments have to be provided, I updated the description.

I didn't understand the comment about just testing with all var. I'd like to test all levels of autodiff.

Sorry for being unclear, there are IMHO two things to consider:

  1. Limiting to reverse mode, should we run the same test for lbeta(double, double), lbeta(var, double), lbeta(double, var) and lbeta(var,var)? I think this is not very useful. Instead of having multiple testing procedures (test_ad, this one) each going through complex templating to invoke all the variants, I think we should instead separately test that all those variants give equal results (this test also IMHO can and should be really strict, e.g. use EXPECT_FLOAT_EQ) and then perform all the tests for identities / finite diff only on the lbeta(var,var). Similarly for forward/mix mode.
  2. Should we run the same test for lbeta(var,var), lbeta(fvar,fvar) and lbeta(fvar<var>, fvar<var>)? This one I am less sure about, but I feel slightly opposed to it. AFAIK the forward and mixed mode are less developed and I would suspect strong tests that the reverse mode would pass could be problematic for forward/mixed mode. Also not sure if all functions that are in reverse mode work with fwd/mix. So in the current state it would IMHO make sense to have separate tests for expect_identity_rev, expect_identity_fwd, expect_identity_mix and invoke those separately in separate tests whose stringency would be tailored to the individual autodiff levels. But I may be wrong about this - I've never really worked with fwd or mix.

Not sure if this is the best place for this discussion - should I move it to Discourse?

@martinmodrak
Copy link
Contributor Author

I wrote a prototype implementation for the var only case, which was reasonably easy to do (see the PR #1677)

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 5, 2020 via email

@martinmodrak
Copy link
Contributor Author

Sorry for not being clear, what I meant is that I think we should separate individual testing concerns (functional identities vs. consistency between instantiations). So a typical test would look like:

for(double x : x_to_test) {
  for(double y : y_to_test) {
    expect_instantiations_equal(lbeta, x, y);
    expect_identity("succesors", lh, rh, tolerance, x, y); 
  }
}

Limiting ourselves to just reverse mode for now expect_instantiations_equal would check that values and matching gradients are equal for lbeta(var,var), lbeta(double, var), etc. expect_identity would then run only for the (var,var) case. This can be a much stronger test than making expect_identity test the identity for all instantiations because expect_identity will need to use expect_near_rel, possibly with quite a large tolerance while expect_instantiations_equal could use EXPECT_FLOAT_EQ because the results should be bit by bit equal or almost equal. Also mimicking the template magic that is currently expect_ad (or packaging it for reuse) would be a non-trivial endeavour.

I am a bit less sure whether it makes sense to have expect_identity always test all AD modes (but only with all-variable instantiations). This would effectively move all such tests to mix folder (which is probably not a big deal) and would force everyone writing new tests to worry about all modes, which might be overwhelming. But I also clearly see a benefit of forcing fwd and mix mode tests, so not sure.

Building expect_instantiations_equal would be a separate issue so that is not intended to do here, I am just trying to justify the choice to simplify expect_identity to only care about the instantiation with all parameters being variables (of either mode).

Hope I am making sense.

@bob-carpenter
Copy link
Contributor

If it's only the complexity of templates that's leading to a proposal to test only foo(var, var) and not foo(var, double) or foo(double, var) for identities, I'm more than happy to help with templating.

The problem with just doing foo(var, var) is that it doesn't say anything about how foo(var, double) is going to behave if they have different implementations. So while I agree that testing is good and any new test is better than nothing, it seems to me like we should go all the way and test all instantations at all levels of autodiff. I'm pretty sure two symbolic implementations with autodiff will be closer than finite diffs.

@syclik---do you have an opinion on this?

@martinmodrak
Copy link
Contributor Author

It's that the added complexity buys us almost nothing. In all codepaths I have seen in rev, the results and respective gradients of foo(var, var), foo(double, var) and foo(var, double) should be bitwise equal, or at worst should satisfy EXPECT_DOUBLE_EQ (which means at most 4 ULP difference).

If you believe this assumption is wrong, than my argument is invalid. But if it is correct than after we tested that the instantiations are equal, it follows that any reasonable test satisfied by foo(var, var) will also be satisfied by the other instantiations and we gain nothing by retesting them. The only reason it could fail is that the right hand side of the identity changes between instantiations, but this IMHO should not happen (in reasonable code) if all its constituents also pass the equal instantiations test.

Also I don't think @syclik was tagged correctly in the previous post so retagging.

@mcol mcol linked a pull request Feb 7, 2020 that will close this issue
5 tasks
@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 10, 2020 via email

@martinmodrak
Copy link
Contributor Author

I think I see where we are missing each other, so at the risk of being a bit annoying, I'll try to rephrase once more - by "foo(var, var) and foo(double, var) should be bitwise equal" I do not mean that this is guaranteed by implementation. I mean "we want them to be bitwise equal" AND "a test that fails if they are not bitwise equal would not be overly strict".

In that sense the current way of testing in expect_ad is very weak because it tests with relative tolerances 1e-4 and larger against a finite diff value. So currently, if say foo(var, double) gradient is 10.001, first element of foo(var, var) gradient is 9.999 and finite diff is 10, all test pass. This to me seems undesirable. We should IMHO test stronger than that (either bitwise equal or EXPECT_DOUBLE_EQ for corresponding values across all instantiations), different instantiation should not be allowed to have non-negligible effects on computation and if they do we should IMHO change the way they are implemented.

If such tests are (can be) satisfied, there is IMHO little need to have other tests try all instantiations.

@martinmodrak
Copy link
Contributor Author

On second thought I am obviously making this discussion too abstract. It will probably serve all of us better if I polish my thoughts, draw a diagram or two and put it in a longer Discourse post. Thanks for your time with this so far.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants