Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using FFT's for all convolutions. #11

Open
chinandrew opened this issue May 22, 2023 · 0 comments
Open

Consider using FFT's for all convolutions. #11

chinandrew opened this issue May 22, 2023 · 0 comments

Comments

@chinandrew
Copy link

In #9 , I mentioned how the FFT convolution is a bit slower than a direct for loop for the length of the templates being used. Below is a comparison of the convolution times (I made this with python so it wont be identical to the R functions but its the same idea)
convolution_times

At first I thought this was the end of the story and we'd just use the for loop, but then I realized that adept runs the convolutions on windows of 6000 (by default), and we can reuse a convolution multiple times.

For example, for a week of 80Hz data, adept chops it up into ~8000 windows of length 6000. The naive implementation of the FFT convolution would just run convolve(data_window, template) for all 30 scalings of the template for all 8000 windows, giving 240000 total calls, requiring 240000 FFTs of each data window, 240000 of each template, and 240000 iFFTs at the end for a total of 720000 [i]FFTs. However, we can share the FFT of the templates across all data windows and share the FFT of the data windows across the templates, i.e. we can do this as 8000 FFTs of each data window, 30 FFTs of each template, and 240000 iFFTs back, for a total of 248030 [i]FFTs.

In addition, we can tune the window length to be as fast as possible. For example, perhaps using 3000 longer windows of length 16000 for a total of 93030 (3000+30+90000) [i]FFTs would be faster. Or maybe a shorter window is faster.

Some crude benchmarking shows approximately 2x speedup (you could also tune the naive FFT windows a bit but they all seemed to be >1.9s):

set.seed(1)
n = 6000
x = 1:n

ptm <- proc.time()
for (i in 1:(30*100)){
  convolve(x,x/n/(n-1))
}
proc.time() - ptm
# 1.9 s


n = 100 # 6000 slices
x = 1:n
ptm <- proc.time()
for (i in 1:30){
  x_fft = fft(x)
}
for (i in 1:6000){
  x_fft2 = Conj(fft(x))
}
for (i in 1:(30*6000)){
  Re(fft(x_fft*x_fft2/n/(n-1), inverse = TRUE))[1]
}
proc.time() - ptm
# 1.16s

n = 600 # 1000 slices
# 0.99s

n = 6000 # 100 slices
# 1.05s

n = 10000 # 60 slices
# 1.06s

n = 60000 # 10 slices
# 1.4s

If I remember correctly, using the naive FFT convolution was maybe 30% slower than the for loop on some test runs (I say this with low confidence, but it definitely wasnt half as slow), so if we can make it ~2x faster we would see a speedup.

Two caveats:

  1. Only the covariance can be written in one convolution as far as I can tell. The correlation requires 3 due to the computation of the sd's, and in that case I'm not sure this would help there (since we'd need something like >3x speedup).
  2. It would probably require an overhaul of how the code thinks about its variables. Essentially it might be easier to start passing around data/templates in the frequency domain than in the time domain.

For now I think if the current state of optimizations is sufficient, it's not worth spending too much time on this (especially with caveat 2). I do think it would be a pretty interesting path to explore though, so wanted to bring it up as a discussion point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant