Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AD Meta Issue for 1.0 #2411

Open
10 tasks
Tracked by #2420
willtebbutt opened this issue Dec 2, 2024 · 2 comments
Open
10 tasks
Tracked by #2420

AD Meta Issue for 1.0 #2411

willtebbutt opened this issue Dec 2, 2024 · 2 comments
Labels
Milestone

Comments

@willtebbutt
Copy link
Member

willtebbutt commented Dec 2, 2024

Below is my view on what AD in Turing 1.0 ought to look like. Please feel free to comment / add your own thoughts -- I'll update this statement in light of new items.

This issue should make clear what the context / background is for the problem, detail what steps need to be taken to make progress, and make it clear what it will take to make this issue as being closed. The context of this at any given point in time reflects what we currently believe to be true, and is subject to change.

Note: this issue is only half done -- I still need to discuss performance.

Summary

There are two main questions to ask about a given AD on a given Turing.jl model:

  1. does it run (correctly)?
  2. is it performant?

In 1.0, we want to be able to be able to make fairly confident statements about the kinds of models that AD works on -- this must be achieved through testing.

Similarly, we want to be able to make quantitative statements about the performance a user should expect from a given AD, and give them advice for debugging if it appears to be slow.

Testing: does it run?

In order to be confident that we have reasonable support in a large range of cases, we need to

  1. define roughly what it is that we want to support,
  2. know what we currently do / do not test + fill in the gaps, and
  3. ensure that the test cases get run in the right places.

1. Rough Support Requirements:

This is the thing that I have the least strong opinions on. Certainly, we want to test all of the varinfos, every Distributions.Distribution that we care about in at least one model, and all of the various bits of syntax which DynamicPPL.jl exposes to the user.

2. Existing test cases for AD and where we run them:

  1. We have some of these in DynamicPPL.TestUtils.DemoModels. and (my understanding is) that they're quite good at checking that you can differentiate a very simple Turing.jl model (specifically, one comprising a single distribution, but implemented in a range of ways).
  2. AD backends are testing in DynamicPPL here.
    This loops over the combination of each AD backend, each element of DemoModels, and each varinfo.

So I get the impression we have moderate coverage of DynamicPPL features, good coverage of the various varinfos, for each AD tested. This testing happens inside Turing.jl.

3. Ensuring Testing Happens

There are three things which should happen in order to give us a reasonable degree of confidence in AD:

  1. we define a collection of models which we want to be able to differentiate,
  2. we run the tests for these in one of the TuringLang repos, making sure to test the thing that users actually call, and
  3. we derive from this collection of models a collection of (f, args...), which we can pass to AD backends and say "hey, this is our current best guess at what you need to be able to differentiate if you want to support Turing.jl. If you want to ensure support for Turing.jl, just run these as part of your integration tests in your CI, and make sure that you can differentiate them correctly and quickly".

Note that it is not sufficient to only do one of 2 or 3, as they each serve slightly different purposes.

2 is necessary because ultimately we are the ones who want to be sure that AD works for our users, and to know what current does not work. In particular, if we change something in Turing.jl which causes AD-related problems, we want to know about them before merging them. Knowing about them, we can either change our implementation to play nicely with the AD having problems, or open an upstream issue if an AD fails to differentiate something that we think it ought really be able to differentiate.

3 is necessary because AD authors will often change internals in their packages. Hopefully their unit tests will catch most problems before they release changes, but there is really no substitute for having a very large array of real test cases to provide something like fuzzing / property testing for your AD. From Turing.jl's perspective, having our test cases being run as part of the CI for the ADs that we care about ensures a better experience for our users.

@penelopeysm has made a start on a more general package https://github.com/penelopeysm/ModelTests.jl/, which aims to systematise testing a bit more thoroughly and provide test cases for use by external packages (correct me if I'm wrong Penny). From my perspective, it goes about this in exactly the right way. In particular:

  1. DynamicPPL.jl can just use the ad_ldp or the ad_di function to turn models into test cases, while
  2. AD backends, such as Mooncake, can hook into make_function and make_params.

Performance

I will finish this section off another day.

Concrete Todo items:

  • decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself. Discussion here
  • extend testing functionality to permit us to manually flag test cases as "broken" on a particular backend
  • decide what additional test cases we want to add, and add them.
  • detail the "Performance" section of this issue (me)
  • make use of testing infrastructure in the DynamicPPL test suite (if it stays in DPPL, there may be nothing to do here)
  • make use of testing infrastructure in the Mooncake test suite (for me to do)
  • start discussions with other AD backends about incorporating our test suite in their integration tests

Linked Issues / PRs:

Questions:

  1. what is the answer to the first concrete todo item?
  2. what existing places where we test ADs have I missed?
  3. does this plan make sense? Is there anything else that should be added?
  4. is anything unclear? In particular, is it at all unclear what we want to get done for Turing.jl 1.0?
@willtebbutt willtebbutt added this to the Turing v1.0.0 milestone Dec 2, 2024
@penelopeysm
Copy link
Member

  • decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself.
  1. what is the answer to the first concrete todo item?

I've started a discussion on this single point here: #2412

@wsmoses
Copy link
Collaborator

wsmoses commented Dec 6, 2024

Can getting #1887 merged be added to the todo's here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants