Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Einsum #299

Closed
wants to merge 1 commit into from
Closed

Einsum #299

wants to merge 1 commit into from

Conversation

MikeInnes
Copy link
Member

@einsum as discussed in #297. The notation is based on the "function of indices" notation --

@einsum [i] -> a[i,j]                  # reduce dim 2
@einsum [i,k] -> a[i,j] * b[j,k]       # matmul
@einsum [i,k,N] -> a[i,j,N] * b[j,k,N] # batch matmul
@einsum [i,j] -> a[i] * b[j]           # outer product

You can enjoy the nice output, too:

julia> @expand @einsum [i,k] -> a[i,j] * b[j,k]
:(a * b)

julia> @expand @einsum [i,j] -> a[i] * b[j]
:(a .* (Flux.reshape)(b, (1, (Flux.size)(b, 1))))

There are various cases this doesn't handle properly yet, and I need to actually implement that batch-matmul primitive somewhere. Will probably throw together a naive CPU implementation, unless anyone knows a trick.

@chengchingwen
Copy link
Member

I found a discuss including a relative paper about generate GPU code and a repo about optimize einsum. Might be useful somehow.

@GiggleLiu
Copy link

GiggleLiu commented Mar 26, 2019

Impressive!

I find

error("Not supported: index $i appears more than twice")

in this PR, does it mean contraction with indices appear more than twice are not supported? like contracting stars

In fact, @dclu and I are going to add support to general purposed Einsum in our gsoc project Funny Tensor Networks. We also have a plan to port Flux.jl, it is really nice to hear someone is also working on this direction. I'd like to hear some advices from you after we have finished our proposal.

@MikeInnes
Copy link
Member Author

Yeah, that's right. This is a fairly straightforward port from TensorFlow's version if anyone wants to pick it up. I can't remember if that's a limitation of the original or just something I didn't get round to. I do know that in some cases the original ends up with very poor space complexity (since it's just doing things pairwise).

It'd be nice to have this working just as a convenience and hopefully develop something more powerful over time.

@GiggleLiu
Copy link

GiggleLiu commented Mar 26, 2019

In numpy, we can do something like this

In [12]: einsum("iii->", random.randn(4,4,4))
Out[12]: 0.3968659339477017

It loops over all indices of input tensors, and cumulate the result to the output tensor/scalar. In numpy, tensordot is used to perform efficient pairwise contraction, which finally calls into BLAS.

The looping strategy trades performance to become powerful.

@datnamer
Copy link

datnamer commented Mar 26, 2019

Wouldn't it be more fruitful and efficient to make sure flux can compose well with existing (and future) tensor notation packages? @mcabbott was working on something like this IIRC

@GiggleLiu
Copy link

GiggleLiu commented Mar 26, 2019

@datnamer

Wouldn't it be more fruitful to make sure flux can compose well with existing tensor notation packages?

I think the problem is Flux is unable to backward through inplace functions like gemm! - the backends of most tensor packages. This is why tensor packages like TensorOperations.jl and ITensors.jl can not utilize autodiff defined in Flux.jl.

In @dclu 's GSoC project, he is going to propose an intermediate tensor notation and implement autodiff in this representation. To be explicit, Einsum is a perfect IR to implement autodiff, see this discussion.

I find the TensorCast.jl (the one you mensioned?) is really helpful. @mcabbott also gave a PR to TensorOperations for porting Flux.jl here. His branch is not merged yet, but proved itself in my recent tests.

@MikeInnes
Copy link
Member Author

Playing well with other packages is definitely the right long-term goal. Right now, einsum has the advantage that it's familiar to users coming from Python and it's very easy to implement (and AD, GPU support etc. just fall out).

In future it would certainly be nice to do fancier things, like differentiating tensor notation directly. That's something that'd be very easy to hook into Zygote.

@mcabbott
Copy link
Member

TensorOperations uses contract! which is just a gemm! which is less fussy about the order of indices, plus add!, trace!. I wrote gradients for these three basic functions in Jutho/TensorOperations.jl#59 after which it should all be differentiable.

More general contractions are still easy enough to do on paper, so it can't be that hard to teach the computer to do them, and return another tensor contraction. Maybe that's roughly what this GSoC project is? Also cool to hear that ITensors.jl may exist.

TensorCast should be completely differentiable, as it just writes reshapes and broadcasting etc. which are themselves already handled. That's morally the same approach as this PR really (only mine is less functional in style, and ten times as verbose...) I did find some bugs in this PR, which I can probably dig up if someone is interested to work on it.

(If you do, I would like to put in a vote against this arrow notation, which seems backwards. In np.einsum at least it's from input to output, but out := in is closer to paper. I would also vote against using the same name as in Einsum.jl.)

@GiggleLiu
Copy link

GiggleLiu commented Mar 26, 2019

More general contractions are still easy enough to do on paper, so it can't be that hard to teach the computer to do them, and return another tensor contraction. Maybe that's roughly what this GSoC project is?

Yes, the simplicity is why einsum is suited as an IR for implementing back-propagation. Not only contraction, but also stars, trace et. al.

# forward         =>     backward code over the first input matrix
ik := ij,jk       =>     ij := ik, jk
ijk := il,jl,kl   =>     il := ijk, jl, kl
:= ii             =>     ii :=        # feeding shape

Correct me if some of them are incorrect.

This an elegant way to unify autodiff of trace, gemm!, and other kinds of einsum. All we need is a dispatchable einsum protocol with Type of Contraction Topology (key point) as type for dispatching to achieve the best performance.

Then the second step is translating tensor networks contraction using the above notation. Efficient contraction sequence algorithm like treewidth is needed (especially for quantum computing simulation). We were persuading JuliaGraphs guys to provide an implementation (based on arXiv:1704.05286). I wonder if you would be interested in implementing this algorithm @mcabbott

Also, matrix factorization backwards is working in progress... A lot of Heavy works

Also cool to hear that ITensors.jl may exist.

Yes, it was release in a Julia meetup in NewYork. The memory independent tensor is really cool, @mfishman will make it open source in the future.

(If you do, I would like to put in a vote against this arrow notation, which seems backwards. In np.einsum at least it's from input to output, but out := in is closer to paper. I would also vote against using the same name as in Einsum.jl.)

+1

@MikeInnes
Copy link
Member Author

FWIW, that syntax is motivated by the fact that [i,j] -> a[i] * b[j] is effectively equivalent to a lambda (e.g. (i, j) -> a[i] * b[j], which would actually work already). It's not backwards if you see the indices as being the inputs. Seeing arrays as functions is admittedly unusual for numerical folks but also pretty foundational to compilers like Halide. := seems like a perfectly reasonable alternative anyhow.

@mcabbott
Copy link
Member

OK, I see what you mean, this is the function you map over LHS indices.

Re rules, those do look correct. Although I'd prefer to say that there's just one rule, ∂M_ab / ∂M_cd = δ_ac δ_bd for matrices, etc. If your output is some T_ab, then for backward mode you always want Δ_ab ∂T_ab / ∂M_cd. Then you can simplify δ_ac Z_xya = Z_xyc etc, but sometimes (as for your trace rule) δ survives.

If you have many factors, then there will be a lot of overlap between the backward mode calculations for them. As you say it seems graph-like, in that each ∂T / ∂M removes just one M and sews in Δ there. I presume that exploiting this would become important. But it would take a me a long time to decode this arxiv paper! And perhaps this is getting off-topic for this PR.

@bionicles
Copy link

Tullio does a decent job at this: https://github.com/mcabbott/Tullio.jl

The syntax for this is much simpler to reason about and I would love to have this in Flux

@ToucheSir
Copy link
Member

Yup, shall we close this since Tullio ticks all the boxes (AD, GPU, etc.)?

@CarloLucibello
Copy link
Member

yes, Tullio works great. I don't think there is the need to reexport it from Flux, but if someone wants that we may consider it

@DhairyaLGandhi
Copy link
Member

I definitely don't think we would re-export tullio, but can keep this one around since it's a lightweight implementation of the same concept so in the future, if we have usecases for lighter dependencies for workloads, this is a viable route forward.

@darsnack
Copy link
Member

Why keep it open? Closed PRs don't disappear, and we don't need to delete the branch either. To some extent, the issue/PR inbox signals something to external users. Keeping PRs open if we have no current plans to merge is not great IMO.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Feb 12, 2021

Closing suggests it's not going to be considered, which is somewhat worse. An open pr isn't going to hurt as much if it's going to come in handy in the future, if nothing but as a starting point for a future pr.

@ToucheSir
Copy link
Member

I would differentiate between closing an issue ("we don't plan on implementing this") and closing a PR ("we don't plan on using this specific implementation"). Unless we are seriously considering reviving this specific implementation, I don't think it's worth keeping open. #297 may be as long as it's made clear that Tullio works right now and something may or may not be created in Flux itself at some point.

@darsnack
Copy link
Member

Yeah I think closing this doesn't imply we will never consider something like @einsum. If anything, it's the opposite. We did consider it, decided that other packages provide this functionality quite well, and there is currently no need for Flux itself to provide it. Closing this makes clear what the decision was.

To be clear, I am not against this particular PR, I just think that closing stale PRs and issues is good practice for any open source project.

@darsnack
Copy link
Member

#297 may be as long as it's made clear that Tullio works right now and something may or may not be created in Flux itself at some point.

Yeah if we want to track this issue, then keep the issue open. But again, we can just as easily close the issue, if we are seriously not considering it, and just say "please re-open this issue if you would like this feature built into Flux."

@DhairyaLGandhi
Copy link
Member

Fair, let's keep the branch around and revisit if needed

@CarloLucibello CarloLucibello deleted the einsum branch April 7, 2022 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants