-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Einsum #299
Conversation
I found a discuss including a relative paper about generate GPU code and a repo about optimize |
Impressive! I find
in this PR, does it mean contraction with indices appear more than twice are not supported? like contracting stars In fact, @dclu and I are going to add support to general purposed |
Yeah, that's right. This is a fairly straightforward port from TensorFlow's version if anyone wants to pick it up. I can't remember if that's a limitation of the original or just something I didn't get round to. I do know that in some cases the original ends up with very poor space complexity (since it's just doing things pairwise). It'd be nice to have this working just as a convenience and hopefully develop something more powerful over time. |
In numpy, we can do something like this
It loops over all indices of input tensors, and cumulate the result to the output tensor/scalar. In numpy, The looping strategy trades performance to become powerful. |
Wouldn't it be more fruitful and efficient to make sure flux can compose well with existing (and future) tensor notation packages? @mcabbott was working on something like this IIRC |
I think the problem is Flux is unable to backward through inplace functions like In @dclu 's GSoC project, he is going to propose an intermediate tensor notation and implement autodiff in this representation. To be explicit, I find the |
Playing well with other packages is definitely the right long-term goal. Right now, In future it would certainly be nice to do fancier things, like differentiating tensor notation directly. That's something that'd be very easy to hook into Zygote. |
TensorOperations uses More general contractions are still easy enough to do on paper, so it can't be that hard to teach the computer to do them, and return another tensor contraction. Maybe that's roughly what this GSoC project is? Also cool to hear that ITensors.jl may exist. TensorCast should be completely differentiable, as it just writes reshapes and broadcasting etc. which are themselves already handled. That's morally the same approach as this PR really (only mine is less functional in style, and ten times as verbose...) I did find some bugs in this PR, which I can probably dig up if someone is interested to work on it. (If you do, I would like to put in a vote against this arrow notation, which seems backwards. In |
Yes, the simplicity is why einsum is suited as an IR for implementing back-propagation. Not only contraction, but also stars, trace et. al.
Correct me if some of them are incorrect. This an elegant way to unify autodiff of trace, gemm!, and other kinds of Then the second step is translating tensor networks contraction using the above notation. Efficient contraction sequence algorithm like Also, matrix factorization backwards is working in progress... A lot of Heavy works
Yes, it was release in a Julia meetup in NewYork. The memory independent tensor is really cool, @mfishman will make it open source in the future.
+1 |
FWIW, that syntax is motivated by the fact that |
OK, I see what you mean, this is the function you Re rules, those do look correct. Although I'd prefer to say that there's just one rule, ∂M_ab / ∂M_cd = δ_ac δ_bd for matrices, etc. If your output is some T_ab, then for backward mode you always want Δ_ab ∂T_ab / ∂M_cd. Then you can simplify δ_ac Z_xya = Z_xyc etc, but sometimes (as for your trace rule) δ survives. If you have many factors, then there will be a lot of overlap between the backward mode calculations for them. As you say it seems graph-like, in that each ∂T / ∂M removes just one M and sews in Δ there. I presume that exploiting this would become important. But it would take a me a long time to decode this arxiv paper! And perhaps this is getting off-topic for this PR. |
Tullio does a decent job at this: https://github.com/mcabbott/Tullio.jl The syntax for this is much simpler to reason about and I would love to have this in Flux |
Yup, shall we close this since Tullio ticks all the boxes (AD, GPU, etc.)? |
yes, Tullio works great. I don't think there is the need to reexport it from Flux, but if someone wants that we may consider it |
I definitely don't think we would re-export tullio, but can keep this one around since it's a lightweight implementation of the same concept so in the future, if we have usecases for lighter dependencies for workloads, this is a viable route forward. |
Why keep it open? Closed PRs don't disappear, and we don't need to delete the branch either. To some extent, the issue/PR inbox signals something to external users. Keeping PRs open if we have no current plans to merge is not great IMO. |
Closing suggests it's not going to be considered, which is somewhat worse. An open pr isn't going to hurt as much if it's going to come in handy in the future, if nothing but as a starting point for a future pr. |
I would differentiate between closing an issue ("we don't plan on implementing this") and closing a PR ("we don't plan on using this specific implementation"). Unless we are seriously considering reviving this specific implementation, I don't think it's worth keeping open. #297 may be as long as it's made clear that Tullio works right now and something may or may not be created in Flux itself at some point. |
Yeah I think closing this doesn't imply we will never consider something like To be clear, I am not against this particular PR, I just think that closing stale PRs and issues is good practice for any open source project. |
Yeah if we want to track this issue, then keep the issue open. But again, we can just as easily close the issue, if we are seriously not considering it, and just say "please re-open this issue if you would like this feature built into Flux." |
Fair, let's keep the branch around and revisit if needed |
@einsum
as discussed in #297. The notation is based on the "function of indices" notation --You can enjoy the nice output, too:
There are various cases this doesn't handle properly yet, and I need to actually implement that batch-matmul primitive somewhere. Will probably throw together a naive CPU implementation, unless anyone knows a trick.