poor performance of `exp()` on 32 bit #10425

pao · 2015-03-06T23:56:27Z

Besides being slow on its own, note that it gets worse for large-valued inputs; on Win64 and Linux performance is faster for large-valued inputs. In all cases, r0 = rand(5000, 5000).

Win32:

julia> r = r0; @time exp(r);
elapsed time: 1.138791707 seconds (190 MB allocated)

julia> r = 10000*r0; @time exp(r);
elapsed time: 3.855262381 seconds (190 MB allocated)

Win64:

julia> @time exp(r0);
elapsed time: 0.463... seconds # what I read before Julia crashes due to #10259

julia> r=10000*r0; @time exp(r0);
elapsed time: 0.235... seconds # what I read before Julia crashes due to #10259

(Note that my Win64 installation is affected by #10249, so it's hard to test there.)

The text was updated successfully, but these errors were encountered:

tkelman · 2015-03-07T02:16:14Z

I can confirm very similar numbers on both Windows and Linux. This is probably an openlibm issue on 32 bit, wouldn't be the first one.

Linux numbers:

julia> versioninfo()
Julia Version 0.4.0-dev+3666
Commit 400fa31* (2015-03-03 22:51 UTC)
Platform Info:
  System: Linux (i686-linux-gnu)
  CPU: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Penryn)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> r0 = rand(5000,5000);

julia> r=r0; @time exp(r);
elapsed time: 1.534604899 seconds (190 MB allocated)

julia> r=10000*r0; @time exp(r);
elapsed time: 3.664688047 seconds (190 MB allocated)

julia> exit()
tkelman@ygdesk:~/Julia/julia-linux32$ cd ../julia
tkelman@ygdesk:~/Julia/julia$ ./julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+3690 (2015-03-06 17:50 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 753390b* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> r0 = rand(5000,5000);

julia> r=r0; @time exp(r);
elapsed time: 0.952396283 seconds (191 MB allocated)

julia> r=10000*r0; @time exp(r);
elapsed time: 0.350571556 seconds (190 MB allocated)

nalimilan · 2015-03-07T09:02:53Z

Comparing with the system libm would be interesting.

tkelman · 2015-03-08T14:33:10Z

I think that'll need someone with a real 32 bit Linux system to try out. I don't think system libm is usable on Windows (not even positive where it lives - inside msvcrt I think?), and my 32 bit Linux builds are multilib compiled from a 64 bit OS.

rickhg12hs · 2015-03-09T01:28:42Z

Machine is old & slow, but here you go ...

$ ./julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-dev+3727 (2015-03-08 23:04 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 768401c* (0 days old master)
|__/                   |  i686-redhat-linux

julia> versioninfo()
Julia Version 0.4.0-dev+3727
Commit 768401c* (2015-03-08 23:04 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Banias)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia> mysysexp(x::Float64) = ccall((:exp, "libm"), Float64, (Float64,), x)
mysysexp (generic function with 1 method)

julia> @vectorize_1arg Float64 mysysexp
mysysexp (generic function with 4 methods)

julia> r0 = rand(Float64,(5000, 5000));

julia> r = r0; @time exp(r);
elapsed time: 2.727944529 seconds (190 MB allocated)

julia> r = r0; @time exp(r);
elapsed time: 2.676581093 seconds (190 MB allocated)

julia> r = r0; @time mysysexp(r);
elapsed time: 3.142749421 seconds (190 MB allocated)

julia> r = r0; @time mysysexp(r);
elapsed time: 3.012173802 seconds (190 MB allocated)

tkelman · 2015-03-09T01:58:14Z

Interesting, thanks. What about for the large values?

rickhg12hs · 2015-03-09T02:09:12Z

Oh yeah, ...

julia> r = 10000 * r0; @time exp(r);
elapsed time: 6.087031901 seconds (190 MB allocated)

julia> r = 10000 * r0; @time exp(r);
elapsed time: 6.066508487 seconds (190 MB allocated)

julia> r = 10000 * r0; @time mysysexp(r);
elapsed time: 10.517670321 seconds (190 MB allocated)

julia> r = 10000 * r0; @time mysysexp(r);
elapsed time: 10.469830777 seconds (190 MB allocated)

tkelman · 2015-03-09T02:16:37Z

Thanks! I don't know much about the internals of how different libm's implement exp (paging @simonbyrne?) but it sounds like the timing trends for openlibm are consistent with glibc, and actually a little better. Something is allowing the 64 bit version to be quite a bit faster, and show the opposite timing trend versus input values. Different allowed instruction sets, I suppose?

ViralBShah · 2015-03-09T04:45:34Z

I don't feel terribly worried about 32-bit. Is there a particular application motivating this - or just an observation, in case we can do something better?

tkelman · 2015-03-09T04:54:02Z

Probably because win64 Julia is completely broken for @pao at this time - #10249

Can we at least look into this? There are still 32-bit bugs in openlibm.

ViralBShah · 2015-03-09T04:57:23Z

We surely should. I was just curious. Should we move this issue to openlibm?

pao · 2015-03-09T06:04:52Z

Sounds like that's the next step on its journey. (FWIW, I just moved the program over to a 64-bit linux machine--I have the luxury of SSH--but unfortunately for Reasons Windows is definitely more convenient.)

simonbyrne · 2015-03-09T08:35:01Z

One point to keep in mind is that Float64 arguments greater than 710 will overflow, so this test is mostly just detecting how fast that branch occurs.

One thing I don't understand with openlibm is what determines whether src/e_exp.c or i387/e_exp.S is used?

ViralBShah · 2015-03-09T09:00:45Z

Both are getting linked.

pao · 2015-03-09T13:27:53Z

One point to keep in mind is that Float64 arguments greater than 710 will overflow, so this test is mostly just detecting how fast that branch occurs.

Going back to the original issue, where we are computing PDFs, the same is true for the underflow branch (where the exponential evaluates to floating-point 0).

simonbyrne · 2015-03-09T14:17:09Z

Having looked again, my guess is that on 686 we're calling the x87 assembly code, which doesn't have an early branch for under or overflow.

KristofferC · 2017-01-26T14:38:43Z

We are using a native julia exp function now. Benchmarking shows no difference in speed on win32 and win64. We have extensive benchmarks for exp already. If we want to compare benchmarks on 32 and 64 bit seems like a different issue.

tkelman · 2017-01-26T14:46:32Z

was there any benchmarking on 32 bit?

KristofferC · 2017-01-26T15:37:53Z

No, only with the two different julia versions.

tkelman · 2017-01-26T15:41:38Z

then there's no evidence this is fixed. the new implementation could easily still be slow on 32 bit

KristofferC · 2017-01-26T15:48:35Z

Do you have access to a 32 bit machine so you can test it?

yuyichao · 2017-01-26T16:03:45Z

On the same hardware with a x64 build with avx2 and a generic i686 build, the performance appear to scale similarly on the two build. The x64 is faster but I think it's likely because of the use of fma. AVX2 on i686 is a pretty weird combination (though supported by the hardware) so I think the difference is OK and this can be closed.

yuyichao · 2017-01-26T16:07:52Z

Oh, and I should say that it's ~2x faster with AVX2.

pao added performance Must go faster system:windows Affects only Windows system:32-bit Affects only 32-bit systems labels Mar 6, 2015

pao mentioned this issue Mar 6, 2015

Windows performance regression for normal PDF, especially near tails, Julia v0.3 vs. v0.4 JuliaStats/Distributions.jl#349

Closed

tkelman removed the system:windows Affects only Windows label Mar 7, 2015

tkelman changed the title ~~poor performance of exp() on Win32~~ poor performance of exp() on 32 bit Mar 7, 2015

pao mentioned this issue Apr 28, 2015

Performance of exp #11048

Closed

KristofferC closed this as completed Jan 26, 2017

tkelman reopened this Jan 26, 2017

KristofferC closed this as completed Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor performance of `exp()` on 32 bit #10425

poor performance of `exp()` on 32 bit #10425

pao commented Mar 6, 2015

tkelman commented Mar 7, 2015

nalimilan commented Mar 7, 2015

tkelman commented Mar 8, 2015

rickhg12hs commented Mar 9, 2015

tkelman commented Mar 9, 2015

rickhg12hs commented Mar 9, 2015

tkelman commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

tkelman commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

pao commented Mar 9, 2015

simonbyrne commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

pao commented Mar 9, 2015

simonbyrne commented Mar 9, 2015

KristofferC commented Jan 26, 2017

tkelman commented Jan 26, 2017

KristofferC commented Jan 26, 2017

tkelman commented Jan 26, 2017

KristofferC commented Jan 26, 2017

yuyichao commented Jan 26, 2017

yuyichao commented Jan 26, 2017

poor performance of exp() on 32 bit #10425

poor performance of exp() on 32 bit #10425

Comments

pao commented Mar 6, 2015

tkelman commented Mar 7, 2015

nalimilan commented Mar 7, 2015

tkelman commented Mar 8, 2015

rickhg12hs commented Mar 9, 2015

tkelman commented Mar 9, 2015

rickhg12hs commented Mar 9, 2015

tkelman commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

tkelman commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

pao commented Mar 9, 2015

simonbyrne commented Mar 9, 2015

ViralBShah commented Mar 9, 2015

pao commented Mar 9, 2015

simonbyrne commented Mar 9, 2015

KristofferC commented Jan 26, 2017

tkelman commented Jan 26, 2017

KristofferC commented Jan 26, 2017

tkelman commented Jan 26, 2017

KristofferC commented Jan 26, 2017

yuyichao commented Jan 26, 2017

yuyichao commented Jan 26, 2017

poor performance of `exp()` on 32 bit #10425

poor performance of `exp()` on 32 bit #10425