-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU optimisation: deriv::Staple and optimising views #11
Comments
Initial improvement consists of moving the creation of the Brief inspection showed that the results are identical and the profiler shows some noticeable improvements in the 'deriv' runtime (looking at the flame graph). Some numbers for the gains to follow shortly. Also, we need to discuss how to merge changes within our fork so that they will be compatible with the upstream (e.g. do we use the version with the patch applied or upstream development branch?). |
Some initial observations from the profiler:
|
Looking at the makeup of the time spent in PeekIndex throughout the code, it's about an order of magnitude higher for Gauge actions (WilsonGaugeActions.h - 10.7s 1omp / 1.11 16omp) than Fermions actions (TwoFlavour.h - 1.2s 1omp / 0.11 16omp). This confirms that OpenMP is working around the |
From our discussions so far, I'm not sure where we should be looking at to optimise further. In Staple, the only thing I can think of is potentially what Ed discussed with splitting up the GagueField matrix to exploit its symmetry. Other than that, it's places like |
@qiUip can you open a PR with what you have so far? We can attempt to merge it upstream... |
Sure, let me copy it to a new, correctly named, branch and create a PR. |
Addressed in PR #16 |
Continuing #7 - see for more details.
The text was updated successfully, but these errors were encountered: