Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] - Optimizations for curve fit #299

Merged
merged 13 commits into from
Sep 13, 2023
Merged

[MNT] - Optimizations for curve fit #299

merged 13 commits into from
Sep 13, 2023

Conversation

TomDonoghue
Copy link
Member

@TomDonoghue TomDonoghue commented Aug 20, 2023

This is a series of small tweaks to the fit algorithm that should result in minor improvements in run speed.

Updates include:

  • a slightly more efficient stepping across params in gaussian function
  • removing unneeding 'ys' initialization in aperiodic fit functions
  • setting check_finite to False in curve_fit (since we do our own data checking)
  • expose curve_fit tolerance settings as a private setting, and set new default value that is a bit more liberal
  • provide precomputed Jacobian matrix, for the Guassian curve_fit call*

*I did add the Jacobians for the AP functions as well, but there was more mixed results whether this provided any speedup - notably these calls are already pretty fast, and so based on this post, I think for AP funcs we are often in a case in which the checks on the passed in Jacobian supercede any benefit as compared to computing it. Because of this, I did not actually set the Jacobian functions to be used in the aperiodic component curve_fit calls.

Results !?

Without being particularly rigorous about profiling, pretty quick / simple tests suggest this update provides something in the range of at least 5-20% speed up even in the simplest fitting cases (which is what I was expecting), but it has an unexpectedly outsized effects on more complex cases - and so in cases with knees and/or more peaks, noise etc - I'm seeing what looks like 2-3X speed increases! I didn't profile in too much detail exactly where this comes from, but I think a bit part is that when we start having multiple Gaussians estimating the Jacobian is slow, but it's actually quick and easy to pass in, and also the tolerance shift can have a notable impact (the other changes are less impactful).

In terms of outputs everything other than the tolerance levels change should have zero effect (meaning should get numerically exactly the same results out). The tolerance change can (though does not necessarily) change results - I've only seen it change beyond what should be notable resolution (eg. changing a peak bandwidth value by ~0.005 or so). If it's indeed limited to the this magnitude of impact, I think it's fine / worth the trade off. Note that the actual value I set (0.00001, replacing the otherwise default of 1e-8) is arbitrary - the higher this gets, more speed increase, but more chance of having more notably impacts. Also, this was added as a private setting, so one can also set it back to match previous default (fm._tol = 1e-8) to return to exactly the same as 1.0, or set a different value as needed.

Notes

  • I followed some pointers from this stack overflow post to try and optimize things.
  • The formulas for the Jacobian matrix calculations all come from Wolfram Alpha. As far as I can see they provide numerically exact answers to not passing in the jacobian and it being estimated, so the outputs are exactly the same.

Reviewing

There's not too much changed in the code to check - what would be good is if someone (@ryanhammonds perhaps) could do a quick check to replicate that a) this branch is at least a bit quicker than main, and b) you see no real change in output parameters. I'm also tagging @rdgao because you might be interested in these changes, and/or have some more knowledgeable insight over and above my heuristic updates here.

Addendum: what I learned about curve_fit

Doing some quick profiling, we spend basically all of our time in curve_fit, meaning it's the only real target for meaningful optimization, and within that, we spend the vast majority of our time doing the Gaussian fit (not really the guess procedure, but the curve_fit call itself). Why here? Well, it turns out with no bounds, curve_fit estimation method ends up as the Levenberg-Marquardt algorithm which is very fast - but does not support bounds. For peaks, we need bounds - adding those pushes the method to default to the Trust Region Reflective ('trf') which is way slower. Notably, there is another option for bounded estimations ('dogbox') - which is sometimes a bit faster, but not consistently enough to try and change it (both methods give the exact same results as far as I can see).

Implications of this:

  • adding bounds, in general, slows things down a lot. We can't avoid this for peaks. In general, we do not provide bounds for AP, and this is (somewhat counter-intuitively) much quicker than if we did, which is useful to know.
  • we spend almost all our fitting time within the curve_fit function, with no other meaningful targets for optimization. This PR does what I could figure out to optimize how we call curve_fit - but my sense beyond this is that we are hitting a limit of what we can speed up without looking for different / faster implementations of the fitting procedure itself (in particular with bounds).

@fooof-tools fooof-tools deleted a comment from codecov-commenter Aug 20, 2023
@TomDonoghue TomDonoghue added the 1.1 Targetted for a fooof 1.1 release. label Aug 20, 2023
@fooof-tools fooof-tools deleted a comment from codecov-commenter Aug 21, 2023
@TomDonoghue TomDonoghue changed the title [MNT] - Small optimizations [MNT] - Optimizations for curve fit Aug 21, 2023
@ryanhammonds
Copy link
Contributor

This is awesome! Really cool to see nice optimizations here. In terms of:

a) this branch is at least a bit quicker than main, and b) you see no real change in output parameters

It may be nice to agree on a set of simulations (I attempted something similar here) that can be used to benchmark and validate performance of various optimizations, so we have some objective measure of choosing one setting over another. I can rerun these simulations to see how this PR compares to main, or we can update/improve the set of simulations first (maybe something like random uniforms from [lower_bound, upper_bound] for each param?).

For bounded optimization, I've had some luck using scipy's LBFGSB. I wonder how it compares to TRF.

@TomDonoghue
Copy link
Member Author

TomDonoghue commented Aug 23, 2023

@ryanhammonds - your work with Rust and comparisons are what got me thinking more about why we are so slow and trying this out!

In terms of having a basic set of comparisons for profiling, I totally agree!
I've added a Github gist with the basic checks I was using here:
https://gist.github.com/TomDonoghue/44d09697fe6b7f9a5b2eadf5fb695148

Basic idea is to check {fixed, knee} fits across {one-peak, multi-peak, high noise} examples. Having cleaned this notebook up a bit, the one / multi peak cases have what seem like useful but fairly modest improvements (order of 5-50% speed up), where as the high noise can be several times faster - this is, I think, because without setting a peak bound we can have many peaks, and estimating the Jacobian for such a large function gets really slow (note that this probably exaggerates the benefit in real cases, since in real cases one would want to bound the number of peaks anyways).

What you have for comparing is more detailed / better than my super simple things - so perhaps we should go with something more like that!

@voytek
Copy link
Contributor

voytek commented Aug 25, 2023

First of all this is great work, Tom!

But ugh what I think this means is that we should not only do what Ryan suggests of having some good simulations for benchmarking, but we should take it even further:

  • Create a set of simulations for benchmarking
  • Note all of the decision points in our curve-fitting (bounds, minimization methods, etc.)
  • Do a full parameter search across all approaches to see which sets of parameters give the fastest fit times under real conditions without compromising fit accuracy

Maybe not for this implementation, but if you two agree, please convert this comment to an issue and I'll save it for when we hire a full-time dev (hopefully this year!).

@fooof-tools fooof-tools deleted a comment from codecov-commenter Aug 27, 2023
@TomDonoghue
Copy link
Member Author

@voytek & @ryanhammonds - I do agree with the broader points here, though I'd like to suggest we don't miss the trees for the forest in terms of this PR as compared to bigger & broader plans :).

For this PR, if we can move forward with fairly standard code review, as long as we are confident that it offers some speedup, and is consistently accurate (the only more notably difference is everyone being on board with the tolerance changes noted above), then I hope we can merge this in, and I can merge it through to 2.0 and continue updates there.

For the broader perspective, I think much if not all of this is best organized around 2.0, since various things will change by then anyways (for example, benchmarking the broader set of fit functions, rather than the limited set in 1.X, and using the updated organization in relation to fully describing algorithm decision points). I think these two things (common benchmarking & more detailed algorithm description) are within scope to happen as we work towards 2.0. In terms of Brad's third point (full audit + explore new approaches), and Ryan's comment about LBFGSB then this is something we can keep in mind and consider after 2.0 if/when someone has time (and can describe in a new issue).

Does this all sound good? If so, the practical outcomes are to check the details of this PR, I'll keep track of some notes and plans here for anything at the 2.0 level, and we can open issues for longer term plans.

@voytek
Copy link
Contributor

voytek commented Aug 28, 2023

Oh for sure. I'm happy with this as is now, I just think we should—for 2.0—actually implement some of the above suggestions.

Copy link
Contributor

@ryanhammonds ryanhammonds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I left a comment for gaussian jacobian function that reduces redundant computation.

Also, I noticed curve_fit has weird behavior for me when using timeit in a notebook (run %%timeit -n 1 -r 1 multiple times vs %%timeit). A similar thing occurs when using a loop to track runtime withtime.time() values across random seeds. The first iteration is slow relative to the subsequent iterations, similar to using %%timeit -n 1 -r 1 vs multilpe runs with %%timeit. If a time.sleep(1.) is inserted at the end of the loop, after timing for the iteration is complete, the effect goes away (all iterations are slowish). Below shows the same 10 simulations and fitting parameters, the only difference being a 1s sleep at the end of the loop after runtime has been logged into an array.

Screenshot 2023-09-01 at 6 12 09 PM

I'm not sure if this is just my computer/hardware or not. The instability in timing has made me question it's accuracy and maybe %%timeit is giving underestimated results.

fooof/core/jacobians.py Outdated Show resolved Hide resolved
fooof/core/jacobians.py Outdated Show resolved Hide resolved
fooof/core/jacobians.py Outdated Show resolved Hide resolved
@TomDonoghue
Copy link
Member Author

TomDonoghue commented Sep 7, 2023

Thanks for the deep dive, and making things even faster @ryanhammonds! In terms of the refactoring, it looks greats - and I left replies to the comments above!

In terms of the timeit quirks - that's a great catch, and I have basically no idea what's going on here lol. My first thought was that maybe curve_fit specifically does some kind of caching (?). To poke at this, using %%timeit -n X -r X and exploring iterations/repeats of 1 or more with other functions (numpy / standard library), as far as I can tell every timed code block is a lot slower for the first iteration? I'm not sure if this is a timeit thing or a Jupyter thing, and it's also not clear how a 1 second sleep would reset things... (I also timed a 1 second time.sleep call, and it has +/- 1-3 ms variance, which is annoying). Despite this seeming like a pretty general property / quirk, I couldn't find anything useful about it. All in all, I dno what's happening and I don't know what to believe lol (though I would perhaps guess that the first iteration is perhaps an overestimate than subsequent iterations necessarily being underestimates...?).

FWIW - I was using %lprun as a diagnostic check for some of the work here (eg %lprun -f FOOOF.fit fm.fit(freqs, powers)), which times and profiles a single iteration giving time / percent per internal call. Though I didn't check total times extensively, using this is consistent in showing speed-ups in it's total time estimation, though this number seems to be a bit different from the timeit values (though perhaps because of profiling overhead).

That aside, I think regardless of some fuzziness of exactly which timeit reports to exactly believe, in relative terms, it's consistent that the current version with Jacobians is faster than main, and your update of the Jacobians is faster than what's currently on this branch, right? If so, I think we can move forward with this PR, and just accept that profiling is confusing and difficult lol.

@codecov-commenter

This comment was marked as resolved.

@ryanhammonds
Copy link
Contributor

ryanhammonds commented Sep 8, 2023

I think the reason why it's slow for the first iteration is due to the use of memoization to reduce redundant computation, particularly at the guess params and bounds since the curve needs to evaluated at those points every time curve_fit is called with the same settings and it seems stores those results until garbage collection frees it (or maybe until the minpack or openblas process drops). Adding noise to guess and bounds removes the effect. This may be a consideration for future benchmarks since related reductions in runtime may not be present with real data if guess and bounds are expected to change.

Everything else looks good to go here though!

fooof/core/jacobians.py Outdated Show resolved Hide resolved
@rdgao
Copy link
Contributor

rdgao commented Sep 11, 2023

had a quick scan, I have nothing insightful to add, I didn't even know you could speed up curve_fit by providing it with the Jacobian, nor the different optimization algos under the hood (dogbox is a great name though). But all this makes sense to me post-hoc.

Somewhat unrelated so not sure if this is the best place to discuss this, but since we're on the topic of many peaks slowing things down: I think the default max_n_peaks can be much smaller, like 3. Having talked to some people about the nitty gritty of hyperparameter choices recently, it seems like people sometimes set max_n_peaks to be much bigger than reasonable (and default is inf?).

With a lower default, the tangential benefit would be a speedup in the common use cases when less resources are dedicated to fitting ghost peaks (which of course wouldn't manifest in the speed tests here when actually setting max_n_peaks), but might lead to "better usage" of FOOOF overall.

@voytek
Copy link
Contributor

voytek commented Sep 11, 2023

@rdgao's comment goes back to my data-driven refactoring idea: let's just try all combos and choose the fastest among the subset that returns the correct fit.

But anyway, I agree about max_n_peaks being a sticking point. I understand the desire to force a default, but that's motivated by our a priori knowledge about neural data. Now that other fields (nuclear fusion, materials science) are adapting specparam for their needs, forcing n=3 or whatever probably doesn't make sense.

@TomDonoghue
Copy link
Member Author

TomDonoghue commented Sep 12, 2023

In terms of the code updates here - following @ryanhammonds work on optimizing the Gaussian Jacobian, I updated the implementation of the aperiodic function Jacobians - they do get ~ twice as fast, but as far as I can tell, using them still does not seem to give any speedup (curve_fit must be able to estimate these very well), so I still have not set them to be used (but they are available in the module).

In terms of @rdgao's comment on tweaking default parameters, I've opened an issue for discussion of any default parameter changes (#303). My suggestion is that we should consider this for the 2.0 release, as it's a key time to change any defaults (I don't want to change any defaults here, in 1.1, as the idea is that running 1.0 vs 1.1 shouldn't change anything). Let's move / continue any discussion there (I've added some comments on max_n_peaks there).

In terms of the previous discussion of developing a more formal / systematic approach for benchmarking and trying out some of the other ideas that have come up, I've started this issue for that: #304

Otherwise, back to this PR, I think we can call this done on code changes, and merge in from here - sound good @ryanhammonds , @voytek, @rdgao ?

@rdgao
Copy link
Contributor

rdgao commented Sep 12, 2023

sounds good to me!

@voytek
Copy link
Contributor

voytek commented Sep 12, 2023

Yep, I'm good with this as well.

@TomDonoghue TomDonoghue merged commit d1e8d8c into main Sep 13, 2023
6 checks passed
@TomDonoghue TomDonoghue deleted the optimize branch September 13, 2023 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.1 Targetted for a fooof 1.1 release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants