Replies: 4 comments
-
In the first case, the code itself is able to run for some number of reps, so call the lambda once, time it, and divide by the number of reps we asked bench_c2c to do internally. In the second case reps is passed as an argument to benchmark, which internally calls the lambda that many times and then does the division. |
Beta Was this translation helpful? Give feedback.
-
@abadams thanks for your feedback. Why don't use same method for both time testings? I found If we change it to same method, the Halide version performance score became worse. |
Beta Was this translation helpful? Give feedback.
-
You might be able to recover most of the loss in performance by changing the Halide target to include The reasoning behind the benchmarking is:
|
Beta Was this translation helpful? Give feedback.
-
In other words, this benchmark is representative of how the Halide FFT is used in production, and how fftw would be used in production if it were to be used instead. |
Beta Was this translation helpful? Give feedback.
-
In apps/fft/main.cpp, I feel confused about below code, Why does the halide running time need to devided by reps? but FFTW doesn't do it?
double halide_t = benchmark(samples, 1, [&]() { bench_c2c.realize(R_c2c); }) * 1e6 / reps;
double fftw_t = benchmark(samples, reps, [&]() { fftwf_execute(c2c_plan); }) * 1e6;
Beta Was this translation helpful? Give feedback.
All reactions