You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can fuse sqrt cos sin in a single pass over the data
import nimpy
import times
import arraymancer
var
tic, toc: float# for mathlet np =pyImport("numpy")
tic =epochTime()
for i in0..<200:
discard np.sqrt(np.cos(np.sin(np.linspace(0, 10, 1000))))
toc =epochTime()
echo"np time: ", toc - tic
tic =epochTime()
for i in0..<200:
discardsqrt(cos(sin(arraymancer.linspace(0, 10, 1000))))
toc =epochTime()
echo"arraymancer time: ", toc - tic
tic =epochTime()
for i in0..<200:
var t = arraymancer.linspace(0, 10, 1000)
t.apply_inline():
x.sin().cos().sqrt()
toc =epochTime()
echo"arraymancer fused time: ", toc - tic
$ nim c -d:danger --hints:off --warnings:off -d:danger -r --outdir:build build/speedtest.nim
np time: 0.009390830993652344
arraymancer time: 0.005604982376098633
arraymancer fused time: 0.004479646682739258
Depending on the number of cores you have, using -d:openmp might also accelerate. I have 36 cores unfortunately and OpenMP doesn't deal with contention that well with the unfused code (not enough work per item).
$ nim c -d:openmp --hints:off --warnings:off -d:danger -r --outdir:build build/speedtest.nim
np time: 0.009420156478881836
arraymancer time: 0.04207587242126465
arraymancer fused time: 0.005712270736694336
Note: for benchmarking CPU time might give you the wrong figures with parallel code that involves multiple CPUs.
Shell and output:
If it is compiled with release
I get time:
Could I improve the speed further?
The text was updated successfully, but these errors were encountered: