-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequential CTMRG is slow compared to Python (with PyTorch) #81
Comments
For χ = 24 (with χ = 12 as initialization) the difference is more significant:
I wonder how can I look for the bottleneck... |
This is not entirely unsurprising. We developed this package with a mindset of "let's make it work first", mostly because Zygote poses a (rather large) number of restrictions on the optimizations that you would typically do to make a Julia algorithm faster. Now, we are indeed at the stage of thinking about what to optimise, but it is not so straightforward given these restrictions. I'm more than happy to have a look at your implementation to think about more ways to speed up, but it's hard to give any kind of answers without further information: I don't know what the other implementations are, what the setup is, did you run this in a multithreaded environment or not, ... If you don't feel comfortable publicly sharing, we can also continue and discuss via email |
For better control of setups, here I only attach the PEPSKit CTMRG with all AD stuff removed: The basic idea is still only to implement the left-move. The other moves are done by rotating the network by 90 degrees. One more change is that I write the functions to update only one column each time (which will be used in the full update algorithm), instead of handling all columns at once. I haven't figure out if there are some Python settings that may affect performance. |
I dont really have enough information to tell you why the python implementation should have different performance. (since I dont really know what you are using in Python). Did you check if both implementations use 1 iteration to denote a move in every direction? Otherwise, I would advise to run a profiler and see if anything stands out |
Yes, a move includes all four directions for all rows and columns. Actually I haven't fully removed Zygote overhead in the Julia version (in the function |
It might be a large variety of things, I would refrain from changing anything before you have a profiler view. The bottlenecks are very often not where you expect them to be. In this case, I don't think the Zygote buffers really do a lot, and I would expect the impact of not using in-place operations, because of the need to be AD compatible, be a much larger factor. |
I use the CTMRG algorithm to measure the Heisenberg model ground state obtained from simple update. The algorithm settings are
The PEPS bond dimension is D = 6, and the environment bond dimension is χ = 12. Starting from random CTMRGEnv, it takes about 1.3s to perform one CTMRG step:
However, using my own Python implementation (using PyTorch; the projectors are also found from the half-infinite environment), it only takes about 0.7s per step, about twice the speed of PEPSKit:
Here
svd_diff
is the convergence criterion calculated as follows (a little bit different from theerr
of PEPSKit):I tried to use the functions in PEPSKit to write a simpler version without the fancy autodiff stuff, then the speed can be improved to about 0.9s per RG step, but is still slower than PyTorch:
So my concern is that the auto-diff stuff from Zygote, etc may cause too much performance overhead for applications not using auto-diff of CTMRG.
The text was updated successfully, but these errors were encountered: