Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for MW orbit integration #701

Open
jamesgrimmett opened this issue Jan 4, 2025 · 3 comments
Open

Performance improvements for MW orbit integration #701

jamesgrimmett opened this issue Jan 4, 2025 · 3 comments

Comments

@jamesgrimmett
Copy link
Contributor

Hi @jobovy, I have recently been looking into whether I could improve the runtime for integrating orbits when using the MWPotential2014 potential. (I'm trying to model up to 1000 orbits on a small number of cpus, so any speedup is useful).

I've noted two things that could help to reduce runtime;

(i) In the force calculations in PowerSphericalPotentialwCutoff.c, the interior mass is found using

gsl_sf_gamma(x)  - gsl_sf_gamma_inc(x,a)

i.e.,

$$\Gamma(x) - \Gamma(x,a) = \int^\infty_0t^{x-1}\mathrm{exp}(-t)dt - \int^\infty_a t^{x-1}\mathrm{exp}(-t)dt$$

I've found that a fair amount of the runtime is spent in gsl_sf_gamma_inc. A cheaper way to compute the lower integral is with

gsl_sf_gamma(x) * gsl_sf_gamma_inc_P(x, a)

i.e.,

$$\Gamma(x) \times P(x,a) = \Gamma(x) \times 1/\Gamma(x)\int^a_0 t^{x-1}\mathrm{exp}(-t)dt$$

I don't completely understand why gsl_sf_gamma_inc_P is signficantly faster than gsl_sf_gamma_inc, to be honest. I think it is to do with GSL's internal optimisations for the lower integral vs. the upper unbounded integral. The speedup also depends on the value of $x, a$ (i.e., $alpha, R$).
In practice (on 12 cpus) I find that this change reduces Orbit().integrate() runtime by 40%, and Orbit.from_name('MW globular clusters').integrate() runtime by 85% (over a 10Gyr integration period).

(ii) As the Rforce and zforce computations tend to be very similar, there are some redundant calculations made during each call to evalRectForce. E.g., for a PowerSphericalPotentialwCutoff potential, the interior mass is calculated once for the Rforce and again for the zforce (see screenshot below showing a section of the call map, gsl_sf_gamma_inc is proxy for the mass calculation, percentages indicate the amount of time that each function was active during runtime (% relative to calling function total)).

callmap

I am wondering if there is a good way to pull out at least the mass calculation (if not the other components as well) from the individual force calculations, while still retaining the ability to compute each force in a standalone way. E.g., perhaps something along the lines of an additional "gravforce" function (unsure about appropriate function name);

[galpy/orbit/orbit_c_ext/integrateFullOrbit.c]
+ potentialArgs->gravforce= &PowerSphericalPotentialwCutoffgravforce;

[galpy/potential/potential_c_ext/galpy_potentials.c]
+ double (calcgravforce)(...){...}

[galpy/potential/potential_c_ext/PowerSphericalPotentialwCutoff.c]
+ PowerSphericalPotentialwCutoffgravforce(...) {return  amp * mass (r2,alpha,rc) / pow(r2,1.5)}

So that from within evalRectForce we could have gravforce = calcgravForce(); Rforce = R * gravforce; zforce = z * gravforce, saving computing the interior mass twice.

I haven't had a close look at how this restructuring would be applied to the force calculations in other potentials as well, but I imagine the speedup will depend on how expensive+similar the R and z force calculations are for each.

Would you be keen for these changes to be brought into main? If so, I'd be happy to open a PR with the changes for at least (i). I'm also happy to include the changes for (ii), though I may need some advice on the best approach.

@jobovy
Copy link
Owner

jobovy commented Jan 6, 2025

Hi @jamesgrimmett,

Thanks for looking into this, these are great optimizations! Some comments:

  • Mass calculation in PowerSphericalPotentialwCutoff.c: looking at the GSL source code, my sense is that the gsl_sf_gamma_inc_P and gsl_sf_gamma_inc_P are far more optimized than the gsl_sf_gamma_inc function, so that's likely the reason that you get a speed-up. For typical orbits, I actually think internally the GSL gsl_sf_gamma_inc_P code might be using gsl_sf_gamma_inc_Q (and getting the P function as 1-Q). So you might also want to try doing Gamma x (1-gsl_sf_gamma_inc_Q ). The Orbit().integrate( and Orbit.from_name('MW globular clusters').integrate( cases are great test cases. If you want to open a PR for this, that would be great!

  • Caching parts of the force calculation: Yes, this is possible and already done for some forces. The easiest way to do this is to add a few "parameters" to the potential that are passed to C and that can then be used to cache the mass calculation for re-use when the Rforce or zforce is requested for the exact same value of (R,z,phi,t) as before (in the spherical case really only r). For example, this is done for the ellipsoidal potentials such as TriaxialNFWPotential, because for those the forces are naturally calculated in rectangular coordinates, so that's done once and then reused for Rforce, zforce, and phiforce calls (you can do this because they always come in triples during an orbit integration). So you can take a look at, e.g., EllipsoidalPotential.c for an example of this. The passing the arguments to C is done here:

    elif isinstance(p, potential.EllipsoidalPotential.EllipsoidalPotential):
    pot_args.append(p._amp)
    pot_args.extend([0.0, 0.0, 0.0, 0.0, 0.0, 0.0]) # for caching
    # Potential specific parameters
    if isinstance(p, potential.TriaxialHernquistPotential):
    pot_type.append(21)
    pot_args.extend([2, p.a, p.a4]) # for psi, mdens, mdens_deriv
    elif isinstance(p, potential.TriaxialNFWPotential):
    .

    Caching the mass for a given r and then re-using it for all Rforce and zforce calculations would be possible for all spherical potentials, but currently there is no general way to implement this, so given how highly used MWPotential2014 is, it would be worthwhile just implementing this caching mechanism for PowerSphericalPotentialwCutoff`. I think you could just add two caching parameters: massandcached_rso you can calculate them once and then just re-use, similar to how this is done for the three components of the force for a givencached_x, cached_y, and cached_zinEllipsoidalPotential.c``. A PR along these lines would also be great!

@jobovy
Copy link
Owner

jobovy commented Jan 6, 2025

A more general solution along the lines that you suggest for the caching of the force components would also be great, because it would solve all spherical potentials and the ellipsoidal ones, but it would be a lot more work, so just doing the caching for PowerSphericalPotential.c would be a great first step!

@jamesgrimmett
Copy link
Contributor Author

Great, thanks for the comments @jobovy. Interesting, I'll take a look at calling the Q function directly, and compare with my results using the P function. I'll open a PR for this sometime this week, hopefully it should be a quick one.

That's very useful that there is already an example for caching results! That looks to be more straightforward than I initially imagined, I'll also have a go at a PR for the caching machanism for PowerSphericalPotentialwCutoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants