Performance improvements for MW orbit integration #701

jamesgrimmett · 2025-01-04T06:54:23Z

Hi @jobovy, I have recently been looking into whether I could improve the runtime for integrating orbits when using the MWPotential2014 potential. (I'm trying to model up to 1000 orbits on a small number of cpus, so any speedup is useful).

I've noted two things that could help to reduce runtime;

(i) In the force calculations in PowerSphericalPotentialwCutoff.c, the interior mass is found using

gsl_sf_gamma(x)  - gsl_sf_gamma_inc(x,a)

i.e.,

$$\Gamma(x) - \Gamma(x,a) = \int^\infty_0t^{x-1}\mathrm{exp}(-t)dt - \int^\infty_a t^{x-1}\mathrm{exp}(-t)dt$$

I've found that a fair amount of the runtime is spent in gsl_sf_gamma_inc. A cheaper way to compute the lower integral is with

gsl_sf_gamma(x) * gsl_sf_gamma_inc_P(x, a)

i.e.,

$$\Gamma(x) \times P(x,a) = \Gamma(x) \times 1/\Gamma(x)\int^a_0 t^{x-1}\mathrm{exp}(-t)dt$$

I don't completely understand why gsl_sf_gamma_inc_P is signficantly faster than gsl_sf_gamma_inc, to be honest. I think it is to do with GSL's internal optimisations for the lower integral vs. the upper unbounded integral. The speedup also depends on the value of $x, a$ (i.e., $alpha, R$).
In practice (on 12 cpus) I find that this change reduces Orbit().integrate() runtime by 40%, and Orbit.from_name('MW globular clusters').integrate() runtime by 85% (over a 10Gyr integration period).

(ii) As the Rforce and zforce computations tend to be very similar, there are some redundant calculations made during each call to evalRectForce. E.g., for a PowerSphericalPotentialwCutoff potential, the interior mass is calculated once for the Rforce and again for the zforce (see screenshot below showing a section of the call map, gsl_sf_gamma_inc is proxy for the mass calculation, percentages indicate the amount of time that each function was active during runtime (% relative to calling function total)).

I am wondering if there is a good way to pull out at least the mass calculation (if not the other components as well) from the individual force calculations, while still retaining the ability to compute each force in a standalone way. E.g., perhaps something along the lines of an additional "gravforce" function (unsure about appropriate function name);

[galpy/orbit/orbit_c_ext/integrateFullOrbit.c]
+ potentialArgs->gravforce= &PowerSphericalPotentialwCutoffgravforce;

[galpy/potential/potential_c_ext/galpy_potentials.c]
+ double (calcgravforce)(...){...}

[galpy/potential/potential_c_ext/PowerSphericalPotentialwCutoff.c]
+ PowerSphericalPotentialwCutoffgravforce(...) {return  amp * mass (r2,alpha,rc) / pow(r2,1.5)}

So that from within evalRectForce we could have gravforce = calcgravForce(); Rforce = R * gravforce; zforce = z * gravforce, saving computing the interior mass twice.

I haven't had a close look at how this restructuring would be applied to the force calculations in other potentials as well, but I imagine the speedup will depend on how expensive+similar the R and z force calculations are for each.

Would you be keen for these changes to be brought into main? If so, I'd be happy to open a PR with the changes for at least (i). I'm also happy to include the changes for (ii), though I may need some advice on the best approach.

The text was updated successfully, but these errors were encountered:

jobovy · 2025-01-06T02:29:26Z

Hi @jamesgrimmett,

Thanks for looking into this, these are great optimizations! Some comments:

Mass calculation in PowerSphericalPotentialwCutoff.c: looking at the GSL source code, my sense is that the gsl_sf_gamma_inc_P and gsl_sf_gamma_inc_P are far more optimized than the gsl_sf_gamma_inc function, so that's likely the reason that you get a speed-up. For typical orbits, I actually think internally the GSL gsl_sf_gamma_inc_P code might be using gsl_sf_gamma_inc_Q (and getting the P function as 1-Q). So you might also want to try doing Gamma x (1-gsl_sf_gamma_inc_Q ). The Orbit().integrate( and Orbit.from_name('MW globular clusters').integrate( cases are great test cases. If you want to open a PR for this, that would be great!

Caching parts of the force calculation: Yes, this is possible and already done for some forces. The easiest way to do this is to add a few "parameters" to the potential that are passed to C and that can then be used to cache the mass calculation for re-use when the Rforce or zforce is requested for the exact same value of (R,z,phi,t) as before (in the spherical case really only r). For example, this is done for the ellipsoidal potentials such as TriaxialNFWPotential, because for those the forces are naturally calculated in rectangular coordinates, so that's done once and then reused for Rforce, zforce, and phiforce calls (you can do this because they always come in triples during an orbit integration). So you can take a look at, e.g., EllipsoidalPotential.c for an example of this. The passing the arguments to C is done here:

galpy/galpy/orbit/integrateFullOrbit.py

Lines 173 to 180 in 8e94507

    
           elif isinstance(p, potential.EllipsoidalPotential.EllipsoidalPotential): 
        
               pot_args.append(p._amp) 
        
               pot_args.extend([0.0, 0.0, 0.0, 0.0, 0.0, 0.0])  # for caching 
        
               # Potential specific parameters 
        
               if isinstance(p, potential.TriaxialHernquistPotential): 
        
                   pot_type.append(21) 
        
                   pot_args.extend([2, p.a, p.a4])  # for psi, mdens, mdens_deriv 
        
               elif isinstance(p, potential.TriaxialNFWPotential):

.

Caching the mass for a given r and then re-using it for all Rforce and zforce calculations would be possible for all spherical potentials, but currently there is no general way to implement this, so given how highly used MWPotential2014 is, it would be worthwhile just implementing this caching mechanism for PowerSphericalPotentialwCutoff`. I think you could just add two caching parameters: massandcached_rso you can calculate them once and then just re-use, similar to how this is done for the three components of the force for a givencached_x, cached_y, and cached_zinEllipsoidalPotential.c``. A PR along these lines would also be great!

jobovy · 2025-01-06T02:32:29Z

A more general solution along the lines that you suggest for the caching of the force components would also be great, because it would solve all spherical potentials and the ellipsoidal ones, but it would be a lot more work, so just doing the caching for PowerSphericalPotential.c would be a great first step!

jamesgrimmett · 2025-01-07T09:15:40Z

Great, thanks for the comments @jobovy. Interesting, I'll take a look at calling the Q function directly, and compare with my results using the P function. I'll open a PR for this sometime this week, hopefully it should be a quick one.

That's very useful that there is already an example for caching results! That looks to be more straightforward than I initially imagined, I'll also have a go at a PR for the caching machanism for PowerSphericalPotentialwCutoff.

jamesgrimmett mentioned this issue Jan 9, 2025

Swap gsl_gamma_inc with gsl_gamma_inc_P for mass calculation #706

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements for MW orbit integration #701

Performance improvements for MW orbit integration #701

jamesgrimmett commented Jan 4, 2025

jobovy commented Jan 6, 2025

jobovy commented Jan 6, 2025

jamesgrimmett commented Jan 7, 2025

Performance improvements for MW orbit integration #701

Performance improvements for MW orbit integration #701

Comments

jamesgrimmett commented Jan 4, 2025

jobovy commented Jan 6, 2025

jobovy commented Jan 6, 2025

jamesgrimmett commented Jan 7, 2025