Skip to content

Latest commit

 

History

History
148 lines (119 loc) · 6.14 KB

README.md

File metadata and controls

148 lines (119 loc) · 6.14 KB

fftr

The fastest Fourier transform in the Rhein computes the complex discrete Fourier transform (DFT) of a signal and back again with O(N log N) performance as usual.

Faster ones exist, but hey, it has decent speed, is less than 400loc and is written in pure Nim for minimal hassle.

It's also likely the only one developed from a viewpoint overlooking the Rhein river.

Usage

In your library:

requires "https://github.com/arnetheduck/nim-fftr"

Compute frequencies in a signal:

import fftr, std/[math, sequtils]

let
  signal = (0..1023).mapIt(complex64(sin(TAU * 0.1 * float64(it))))

  # abs gets us back into real space
  # false for forward FFT, true for inverse (TODO flip it? separate names?)
  frequencies = fft(signal, false).mapIt(abs(it))
  # Scaling must be applied manually
  signalAgain = fft(fft(signal, false), true).mapIt(abs(it)/signal.len.float64)

For performance, it's important to compile the application with LTO enabled so that the complex module gets inlined properly - see nim.cfg for an example of how to do this.

Benchmarks

Beats FFTW on at least one input size :)

fftr:

nim c -d:release -r benches/bench_fftr

   min time    avg time  std dv   runs name
   0.001 ms    0.001 ms  ±0.000  x1000 pow2 - 64
   0.001 ms    0.001 ms  ±0.001  x1000 pow2 - 128
   0.003 ms    0.004 ms  ±0.002  x1000 pow2 - 256
   0.006 ms    0.007 ms  ±0.001  x1000 pow2 - 512
   0.013 ms    0.014 ms  ±0.001  x1000 pow2 - 1024
   0.029 ms    0.032 ms  ±0.001  x1000 pow2 - 2048
   0.067 ms    0.070 ms  ±0.002  x1000 pow2 - 4096
   0.309 ms    0.321 ms  ±0.004  x1000 pow2 - 16384
   1.423 ms    1.461 ms  ±0.017  x1000 pow2 - 65536
   0.001 ms    0.001 ms  ±0.000  x1000 prime - 5
   0.002 ms    0.002 ms  ±0.000  x1000 prime - 17
   0.023 ms    0.025 ms  ±0.001  x1000 prime - 149
   0.023 ms    0.025 ms  ±0.001  x1000 prime - 151
   0.025 ms    0.027 ms  ±0.001  x1000 prime - 251
   0.111 ms    0.116 ms  ±0.003  x1000 prime - 1009
   0.238 ms    0.246 ms  ±0.006  x1000 prime - 2017
   0.488 ms    0.505 ms  ±0.009  x1000 prime - 2879
   4.934 ms    5.051 ms  ±0.057   x989 prime - 32767
  10.866 ms   11.053 ms  ±0.102   x452 prime - 65521
  21.629 ms   21.957 ms  ±0.165   x228 prime - 65537
 244.610 ms  249.728 ms  ±6.073    x20 prime - 746483
 244.258 ms  246.235 ms  ±1.584    x21 prime - 746497
  10.468 ms   10.688 ms  ±0.091   x468 prime-power - 44521
  47.279 ms   47.748 ms  ±0.308   x105 prime-power - 160801
   1.309 ms    1.361 ms  ±0.021  x1000 mult-of-power-of-2 - 24576
   1.915 ms    1.959 ms  ±0.023  x1000 mult-of-power-of-2 - 20736
   4.171 ms    4.251 ms  ±0.037  x1000 small-comp-large-prime - 30270
   0.002 ms    0.002 ms  ±0.000  x1000 small-comp - 18
   0.028 ms    0.029 ms  ±0.001  x1000 small-comp - 360
   6.936 ms    7.046 ms  ±0.046   x710 small-comp - 44100
   4.653 ms    4.759 ms  ±0.041  x1000 small-comp - 48000
   4.919 ms    5.049 ms  ±0.045   x991 small-comp - 46656
  11.567 ms   11.741 ms  ±0.067   x426 small-comp - 100000

fftw:

nim c -d:release -r benches/bench_fftw

   min time    avg time  std dv   runs name
   0.003 ms    0.005 ms  ±0.002  x1000 pow2 - 64
   0.003 ms    0.005 ms  ±0.001  x1000 pow2 - 128
   0.004 ms    0.006 ms  ±0.002  x1000 pow2 - 256
   0.005 ms    0.006 ms  ±0.002  x1000 pow2 - 512
   0.006 ms    0.007 ms  ±0.001  x1000 pow2 - 1024
   0.008 ms    0.010 ms  ±0.002  x1000 pow2 - 2048
   0.016 ms    0.024 ms  ±0.008  x1000 pow2 - 4096
   0.060 ms    0.086 ms  ±0.037  x1000 pow2 - 16384
   0.285 ms    0.471 ms  ±0.348  x1000 pow2 - 65536
   0.002 ms    0.002 ms  ±0.001  x1000 prime - 5
   0.001 ms    0.002 ms  ±0.001  x1000 prime - 17
   0.029 ms    0.040 ms  ±0.011  x1000 prime - 149
   0.038 ms    0.045 ms  ±0.008  x1000 prime - 151
   0.039 ms    0.048 ms  ±0.009  x1000 prime - 251
   0.073 ms    0.108 ms  ±0.035  x1000 prime - 1009
   0.139 ms    0.205 ms  ±0.080  x1000 prime - 2017
   0.200 ms    0.312 ms  ±0.140  x1000 prime - 2879
   0.858 ms    0.955 ms  ±0.103  x1000 prime - 32767
   4.152 ms    4.952 ms  ±1.324   x936 prime - 65521
   1.920 ms    2.050 ms  ±0.119  x1000 prime - 65537
  88.072 ms  101.897 ms ±12.072    x50 prime - 746483
  55.563 ms   63.118 ms  ±8.796    x79 prime - 746497
   1.701 ms    2.138 ms  ±0.660  x1000 prime-power - 44521
   5.052 ms    5.345 ms  ±0.471   x904 prime-power - 160801
   0.105 ms    0.129 ms  ±0.037  x1000 mult-of-power-of-2 - 24576
   0.114 ms    0.137 ms  ±0.033  x1000 mult-of-power-of-2 - 20736
   0.711 ms    0.754 ms  ±0.025  x1000 small-comp-large-prime - 30270
   0.004 ms    0.005 ms  ±0.001  x1000 small-comp - 18
   0.009 ms    0.011 ms  ±0.004  x1000 small-comp - 360
   0.230 ms    0.251 ms  ±0.028  x1000 small-comp - 44100
   0.222 ms    0.241 ms  ±0.029  x1000 small-comp - 48000
   0.270 ms    0.292 ms  ±0.022  x1000 small-comp - 46656
   0.529 ms    0.560 ms  ±0.033  x1000 small-comp - 100000

Guts and references

Stuff that seemed useful while writing the code:

  • Stockham
  • Radix2 FFT for power-of-2-length transforms
  • Bluestein for prime length transforms
  • Mixed radix of Radix-2 and Bluestein for the rest
  • Butterflies for 1, 2 and 4-length transforms

More can be done of course - in particular, Radix-3,5,7 are high on the wishlist.

Missing stuff

  • planning
  • multi-dimensional FFT:s
  • more butterflies
  • more algorithms (Rader, Good-Thomas etc)
  • iterative / in-place variants
  • SIMD
  • etc...

Resources

  • nimfftw3 wrapper for Nim - faster, less convenient to use

Music of choice

Dooz Kawa - Etioles de Sol