Einsum #299

MikeInnes · 2018-06-15T22:39:18Z

@einsum as discussed in #297. The notation is based on the "function of indices" notation --

@einsum [i] -> a[i,j]                  # reduce dim 2
@einsum [i,k] -> a[i,j] * b[j,k]       # matmul
@einsum [i,k,N] -> a[i,j,N] * b[j,k,N] # batch matmul
@einsum [i,j] -> a[i] * b[j]           # outer product

You can enjoy the nice output, too:

julia> @expand @einsum [i,k] -> a[i,j] * b[j,k]
:(a * b)

julia> @expand @einsum [i,j] -> a[i] * b[j]
:(a .* (Flux.reshape)(b, (1, (Flux.size)(b, 1))))

There are various cases this doesn't handle properly yet, and I need to actually implement that batch-matmul primitive somewhere. Will probably throw together a naive CPU implementation, unless anyone knows a trick.

chengchingwen · 2018-08-17T07:38:07Z

I found a discuss including a relative paper about generate GPU code and a repo about optimize einsum. Might be useful somehow.

GiggleLiu · 2019-03-26T12:23:20Z

Impressive!

I find

error("Not supported: index $i appears more than twice")

in this PR, does it mean contraction with indices appear more than twice are not supported? like contracting stars

In fact, @dclu and I are going to add support to general purposed Einsum in our gsoc project Funny Tensor Networks. We also have a plan to port Flux.jl, it is really nice to hear someone is also working on this direction. I'd like to hear some advices from you after we have finished our proposal.

MikeInnes · 2019-03-26T13:04:48Z

Yeah, that's right. This is a fairly straightforward port from TensorFlow's version if anyone wants to pick it up. I can't remember if that's a limitation of the original or just something I didn't get round to. I do know that in some cases the original ends up with very poor space complexity (since it's just doing things pairwise).

It'd be nice to have this working just as a convenience and hopefully develop something more powerful over time.

GiggleLiu · 2019-03-26T13:25:51Z

In numpy, we can do something like this

In [12]: einsum("iii->", random.randn(4,4,4))
Out[12]: 0.3968659339477017

It loops over all indices of input tensors, and cumulate the result to the output tensor/scalar. In numpy, tensordot is used to perform efficient pairwise contraction, which finally calls into BLAS.

The looping strategy trades performance to become powerful.

datnamer · 2019-03-26T13:30:52Z

Wouldn't it be more fruitful and efficient to make sure flux can compose well with existing (and future) tensor notation packages? @mcabbott was working on something like this IIRC

GiggleLiu · 2019-03-26T13:34:39Z

@datnamer

Wouldn't it be more fruitful to make sure flux can compose well with existing tensor notation packages?

I think the problem is Flux is unable to backward through inplace functions like gemm! - the backends of most tensor packages. This is why tensor packages like TensorOperations.jl and ITensors.jl can not utilize autodiff defined in Flux.jl.

In @dclu 's GSoC project, he is going to propose an intermediate tensor notation and implement autodiff in this representation. To be explicit, Einsum is a perfect IR to implement autodiff, see this discussion.

I find the TensorCast.jl (the one you mensioned?) is really helpful. @mcabbott also gave a PR to TensorOperations for porting Flux.jl here. His branch is not merged yet, but proved itself in my recent tests.

MikeInnes · 2019-03-26T14:09:38Z

Playing well with other packages is definitely the right long-term goal. Right now, einsum has the advantage that it's familiar to users coming from Python and it's very easy to implement (and AD, GPU support etc. just fall out).

In future it would certainly be nice to do fancier things, like differentiating tensor notation directly. That's something that'd be very easy to hook into Zygote.

mcabbott · 2019-03-26T14:46:28Z

TensorOperations uses contract! which is just a gemm! which is less fussy about the order of indices, plus add!, trace!. I wrote gradients for these three basic functions in Jutho/TensorOperations.jl#59 after which it should all be differentiable.

More general contractions are still easy enough to do on paper, so it can't be that hard to teach the computer to do them, and return another tensor contraction. Maybe that's roughly what this GSoC project is? Also cool to hear that ITensors.jl may exist.

TensorCast should be completely differentiable, as it just writes reshapes and broadcasting etc. which are themselves already handled. That's morally the same approach as this PR really (only mine is less functional in style, and ten times as verbose...) I did find some bugs in this PR, which I can probably dig up if someone is interested to work on it.

(If you do, I would like to put in a vote against this arrow notation, which seems backwards. In np.einsum at least it's from input to output, but out := in is closer to paper. I would also vote against using the same name as in Einsum.jl.)

GiggleLiu · 2019-03-26T15:36:52Z

More general contractions are still easy enough to do on paper, so it can't be that hard to teach the computer to do them, and return another tensor contraction. Maybe that's roughly what this GSoC project is?

Yes, the simplicity is why einsum is suited as an IR for implementing back-propagation. Not only contraction, but also stars, trace et. al.

# forward         =>     backward code over the first input matrix
ik := ij,jk       =>     ij := ik, jk
ijk := il,jl,kl   =>     il := ijk, jl, kl
:= ii             =>     ii :=        # feeding shape

Correct me if some of them are incorrect.

This an elegant way to unify autodiff of trace, gemm!, and other kinds of einsum. All we need is a dispatchable einsum protocol with Type of Contraction Topology (key point) as type for dispatching to achieve the best performance.

Then the second step is translating tensor networks contraction using the above notation. Efficient contraction sequence algorithm like treewidth is needed (especially for quantum computing simulation). We were persuading JuliaGraphs guys to provide an implementation (based on arXiv:1704.05286). I wonder if you would be interested in implementing this algorithm @mcabbott

Also, matrix factorization backwards is working in progress... A lot of Heavy works

Also cool to hear that ITensors.jl may exist.

Yes, it was release in a Julia meetup in NewYork. The memory independent tensor is really cool, @mfishman will make it open source in the future.

(If you do, I would like to put in a vote against this arrow notation, which seems backwards. In np.einsum at least it's from input to output, but out := in is closer to paper. I would also vote against using the same name as in Einsum.jl.)

+1

MikeInnes · 2019-03-26T16:00:05Z

FWIW, that syntax is motivated by the fact that [i,j] -> a[i] * b[j] is effectively equivalent to a lambda (e.g. (i, j) -> a[i] * b[j], which would actually work already). It's not backwards if you see the indices as being the inputs. Seeing arrays as functions is admittedly unusual for numerical folks but also pretty foundational to compilers like Halide. := seems like a perfectly reasonable alternative anyhow.

mcabbott · 2019-03-26T21:12:44Z

OK, I see what you mean, this is the function you map over LHS indices.

Re rules, those do look correct. Although I'd prefer to say that there's just one rule, ∂M_ab / ∂M_cd = δ_ac δ_bd for matrices, etc. If your output is some T_ab, then for backward mode you always want Δ_ab ∂T_ab / ∂M_cd. Then you can simplify δ_ac Z_xya = Z_xyc etc, but sometimes (as for your trace rule) δ survives.

If you have many factors, then there will be a lot of overlap between the backward mode calculations for them. As you say it seems graph-like, in that each ∂T / ∂M removes just one M and sews in Δ there. I presume that exploiting this would become important. But it would take a me a long time to decode this arxiv paper! And perhaps this is getting off-topic for this PR.

bionicles · 2020-05-26T22:38:46Z

Tullio does a decent job at this: https://github.com/mcabbott/Tullio.jl

The syntax for this is much simpler to reason about and I would love to have this in Flux

ToucheSir · 2021-02-11T23:53:12Z

Yup, shall we close this since Tullio ticks all the boxes (AD, GPU, etc.)?

CarloLucibello · 2021-02-12T00:42:16Z

yes, Tullio works great. I don't think there is the need to reexport it from Flux, but if someone wants that we may consider it

DhairyaLGandhi · 2021-02-12T03:05:04Z

I definitely don't think we would re-export tullio, but can keep this one around since it's a lightweight implementation of the same concept so in the future, if we have usecases for lighter dependencies for workloads, this is a viable route forward.

darsnack · 2021-02-12T03:11:52Z

Why keep it open? Closed PRs don't disappear, and we don't need to delete the branch either. To some extent, the issue/PR inbox signals something to external users. Keeping PRs open if we have no current plans to merge is not great IMO.

DhairyaLGandhi · 2021-02-12T03:18:20Z

Closing suggests it's not going to be considered, which is somewhat worse. An open pr isn't going to hurt as much if it's going to come in handy in the future, if nothing but as a starting point for a future pr.

ToucheSir · 2021-02-12T03:29:51Z

I would differentiate between closing an issue ("we don't plan on implementing this") and closing a PR ("we don't plan on using this specific implementation"). Unless we are seriously considering reviving this specific implementation, I don't think it's worth keeping open. #297 may be as long as it's made clear that Tullio works right now and something may or may not be created in Flux itself at some point.

darsnack · 2021-02-12T03:44:43Z

Yeah I think closing this doesn't imply we will never consider something like @einsum. If anything, it's the opposite. We did consider it, decided that other packages provide this functionality quite well, and there is currently no need for Flux itself to provide it. Closing this makes clear what the decision was.

To be clear, I am not against this particular PR, I just think that closing stale PRs and issues is good practice for any open source project.

darsnack · 2021-02-12T03:48:25Z

#297 may be as long as it's made clear that Tullio works right now and something may or may not be created in Flux itself at some point.

Yeah if we want to track this issue, then keep the issue open. But again, we can just as easily close the issue, if we are seriously not considering it, and just say "please re-open this issue if you would like this feature built into Flux."

DhairyaLGandhi · 2021-02-12T03:53:07Z

Fair, let's keep the branch around and revisit if needed

wip einsum

9c8e260

MikeInnes force-pushed the einsum branch from d3dd693 to 9c8e260 Compare June 15, 2018 22:40

MikeInnes force-pushed the master branch from 65799d1 to 193c4de Compare September 5, 2018 15:53

MikeInnes added the help wanted label Mar 26, 2019

CarloLucibello closed this Feb 12, 2021

DhairyaLGandhi reopened this Feb 12, 2021

darsnack mentioned this pull request Feb 12, 2021

Implement einsum function/macro à la PyTorch and TF #297

Open

DhairyaLGandhi closed this Feb 12, 2021

CarloLucibello deleted the einsum branch April 7, 2022 07:02

Uh oh!

Einsum #299

Einsum #299

Uh oh!

Conversation

MikeInnes commented Jun 15, 2018

Uh oh!

chengchingwen commented Aug 17, 2018

Uh oh!

GiggleLiu commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikeInnes commented Mar 26, 2019

Uh oh!

GiggleLiu commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datnamer commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GiggleLiu commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikeInnes commented Mar 26, 2019

Uh oh!

mcabbott commented Mar 26, 2019

Uh oh!

GiggleLiu commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikeInnes commented Mar 26, 2019

Uh oh!

mcabbott commented Mar 26, 2019

Uh oh!

bionicles commented May 26, 2020

Uh oh!

ToucheSir commented Feb 11, 2021

Uh oh!

CarloLucibello commented Feb 12, 2021

Uh oh!

DhairyaLGandhi commented Feb 12, 2021

Uh oh!

darsnack commented Feb 12, 2021

Uh oh!

DhairyaLGandhi commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ToucheSir commented Feb 12, 2021

Uh oh!

darsnack commented Feb 12, 2021

Uh oh!

darsnack commented Feb 12, 2021

Uh oh!

DhairyaLGandhi commented Feb 12, 2021

Uh oh!

Uh oh!

GiggleLiu commented Mar 26, 2019 •

edited

Loading

GiggleLiu commented Mar 26, 2019 •

edited

Loading

datnamer commented Mar 26, 2019 •

edited

Loading

GiggleLiu commented Mar 26, 2019 •

edited

Loading

GiggleLiu commented Mar 26, 2019 •

edited

Loading

DhairyaLGandhi commented Feb 12, 2021 •

edited

Loading