Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fresh attempt at DI integration #54

Merged
merged 14 commits into from
Jul 20, 2024
Merged

[WIP] Fresh attempt at DI integration #54

merged 14 commits into from
Jul 20, 2024

Conversation

Vaibhavdixit02
Copy link
Member

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

@gdalle
Copy link
Contributor

gdalle commented May 31, 2024

You're running DI v0.4.2 here, something is blocking it from updating. Can you add a v0.5.2 compat bound and see what it says?

@gdalle
Copy link
Contributor

gdalle commented May 31, 2024

The error sounds like a type instability?

@gdalle
Copy link
Contributor

gdalle commented Jun 3, 2024

@Vaibhavdixit02 when I replace the function inside hessian! with sum it works, so I would assume the current behavior is due to the way you construct cons_oop with closures.

@Vaibhavdixit02
Copy link
Member Author

Vaibhavdixit02 commented Jun 3, 2024

Yup, I figured that. I spent some time reading up on the vector -> vector hessian discussion (and the background) you and Tim Holy were having, hoping to do this more rigorously but I don't see a natural way, asking users to pass a vector of functions is not ideal and feels unnatural to me - this will show up in multiobjective problems as well, even though that's typically majorly derivative free algorithms. I have some WIP locally this is moving slow because of other things right now, I will get back to it soon.

@gdalle
Copy link
Contributor

gdalle commented Jun 4, 2024

So do you think you need the vector Hessian?

@Vaibhavdixit02
Copy link
Member Author

No, not right now, I can work around that (as I have been doing in the implementations that exist here already) but it would be nice to have it in the future

@wsmoses
Copy link

wsmoses commented Jun 6, 2024

@Vaibhavdixit02 @ChrisRackauckas please do not drop the Enzyme extension in favor of DI here.

This will lead to both more codes erring and suboptimal performance as DI will create both unnecessary closures among other issues. I would prefer for user code not to be made to break because of an unnecessary intermediate closure from an interface.

See some comments about this tpapp/LogDensityProblemsAD.jl#29 (review) / tpapp/LogDensityProblemsAD.jl#29 (comment)

Unfortunately when I commented "Closures were one of the reasons imo that AD.jl didn't take off, so long term I think DI.jl would be well served by both not introducing them itself (my understanding is that it does this successfully), but also not forcing users to create the same closures user-side (which would create similar issues)." but the response was "DI is unlikely to support either multiple arguments or activity specification anytime soon"

x/ref gdalle/DifferentiationInterface.jl#307

@gdalle
Copy link
Contributor

gdalle commented Jun 6, 2024

I am aware of this limitation of DifferentiationInterface but I'm not sure it's obvious to others how much work would be required for multiple arguments or activity analysis. The reason we're able to detect bugs and suboptimalities in Enzyme before anyone else (see the 100x gradient slowdown from a month ago) is because we have a top tier test suite, which would need to grow by a factor of at least 2 to accommodate this new feature. It's 1k to 2k LOCs at least and a profound rethink, so even if I want to do it, it can't happen overnight.

And since it's only essential for Enzyme, another option would be to provide the operators I need for DI on the Enzyme side. That would make it so I don't have to code every operator suboptimally with my imperfect understanding of your docs.

@gdalle
Copy link
Contributor

gdalle commented Jun 6, 2024

Cross referencing for technical details

@gdalle
Copy link
Contributor

gdalle commented Jun 6, 2024

And I also want to push back on the "unnecessary interface intermediate". DI fills a longstanding need in the ecosystem, especially for packages like Enzyme where the API is very hard to grasp. Not everyone is gonna be able or willing to use Enzyme directly, so I do believe there is value in even a suboptimal DI.

Since I aim to support every backend and not just Enzyme, there's always gonna be a compromise on how far I support each one. I'm gonna look into activities, but I just need you to realize that it's easy to get it out quick and dirty, much harder to do it right

@wsmoses
Copy link

wsmoses commented Jun 6, 2024

Ah I apologize @gdalle I did not mean that to refer to DI as unnecessary (I think it is an amazing package and the work that you and Adrian are doing to make things easier to use is sorely needed).

I meant only to refer to the intermediate closure that DI would generate for the hessian as unnecessary -- strictly in the technical sense (eg wrapping things into a closure isn't needed when you could call a multi argument version without the closure).

I realize in retrospect the poor wording and I sincerely apologize.

@wsmoses
Copy link

wsmoses commented Jun 6, 2024

And I fully agree with you that it is a difficult thing to get multi argument and activities correct.

However, replacing existing code that does not introduce these will create breakage so I don't think it wise for DI to replace existing codes with such support until DI vendors support as well.

For the backends without this need -- or packages without generalization of backend -- it is already a strict improvement and isn't a question that its use is immediately useful (and thus would be good to replace the other backends here, for example).

@gdalle
Copy link
Contributor

gdalle commented Jun 6, 2024

Thanks for clarifying, and sorry for overreacting on my end 😊 Maybe this PR can replace the other backends and leave Enzyme alone for the time being?

@ChrisRackauckas
Copy link
Member

We have no intentions to change to DI until it matches the performance of what we already have. It's on our radar, but we will not sacrifice features or performance. Instead, we will keep on top of it until DI can do exactly this, and then it's ready.

@gdalle
Copy link
Contributor

gdalle commented Jun 7, 2024

Which benchmarks am I expected to pass for this bar to be met?
And what happens if I'm faster in some aspects but slower in others?

@ChrisRackauckas
Copy link
Member

If it's anywhere close to the performance then we'll take it as we would greatly enjoy the decreased maintanance burden. But it should be able to do the standard constrained and unconstrained tutorials with Zygote, Enzyme, and ForwardDiff and basically match the performance all of the way through. If I'm not mistaken, currently the closures are a bit of a blocker to this.

@gdalle
Copy link
Contributor

gdalle commented Jun 7, 2024

If you're talking about the tutorials in the docs, some of them are on toy problems (see e.g. https://docs.sciml.ai/Optimization/stable/tutorials/constraints/). Is there an actual realistic benchmark suite that can be run to track and compare performance?

@ChrisRackauckas
Copy link
Member

The toy problems will be the hardest because that will measure pure overhead.

@gdalle
Copy link
Contributor

gdalle commented Jun 7, 2024

But it should be able to do the standard constrained and unconstrained tutorials with Zygote, Enzyme, and ForwardDiff and basically match the performance all of the way through.

What I'm suggesting above, following the discussion with Billy, is that DI doesn't necessarily have to overtake all backends at once. It seems reasonable to leave Enzyme aside for now cause that's the one where I would struggle most to milk the last drops of performance.

@gdalle
Copy link
Contributor

gdalle commented Jun 7, 2024

The toy problems will be the hardest because that will measure pure overhead.

Okay then. Ping me when the PR is ready for use @Vaibhavdixit02 and I'll review then help benchmark

@gdalle
Copy link
Contributor

gdalle commented Jul 16, 2024

Beware of the local sparsity detector: you shouldn't have to use it in general.

julia> using ADTypes: jacobian_sparsity

julia> using SparseConnectivityTracer

julia> function con2_c(res, x)
           res .= [x[1]^2 + x[2]^2, x[2] * sin(x[1]) - x[1]]
       end
con2_c (generic function with 1 method)

julia> jacobian_sparsity(con2_c, zeros(2), zeros(2), TracerSparsityDetector()) # input-independent
2×2 SparseArrays.SparseMatrixCSC{Bool, Int64} with 4 stored entries:
 1  1
 1  1

julia> jacobian_sparsity(con2_c, zeros(2), zeros(2), TracerLocalSparsityDetector()) # input-dependent
2×2 SparseArrays.SparseMatrixCSC{Bool, Int64} with 3 stored entries:
 1  1
 1  

@Vaibhavdixit02 Vaibhavdixit02 changed the base branch from main to v2 July 20, 2024 13:30
@Vaibhavdixit02 Vaibhavdixit02 merged commit 1325478 into v2 Jul 20, 2024
2 of 5 checks passed
@Vaibhavdixit02 Vaibhavdixit02 deleted the Diattempt2 branch July 20, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants