Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate why upsampling tests are so slow #442

Open
ToucheSir opened this issue Nov 2, 2022 · 5 comments
Open

Investigate why upsampling tests are so slow #442

ToucheSir opened this issue Nov 2, 2022 · 5 comments
Labels
ci CI related issues or enhancements help wanted performance

Comments

@ToucheSir
Copy link
Member

Master, on my machine:

Test Summary:            | Pass  Broken  Total      Time
NNlib.jl                 | 7370       5   7375  26m21.1s
  Doctests               |    1              1     21.8s
  Activation Functions   | 1683           1683   4m24.9s
  Batched Multiplication | 3327           3327   2m09.5s
  Convolution            |  559       1    560   1m20.2s
  CTC Loss               |    5              5      1.2s
  Inference              |   28             28      2.3s
  Pooling                |  240       4    244     34.0s
  Padding                |   37             37     15.7s
  Softmax                |  128            128     39.1s
  Upsampling             |   78             78  14m27.6s
  Gather                 |   30             30      4.7s
  Scatter                | 1223           1223   1m55.0s
  Utilities              |   10             10      0.8s
  Grid Sampling          |   17             17      1.2s
  Functions              |    4              4      2.8s
     Testing NNlib tests passed 
@ToucheSir ToucheSir added help wanted performance ci CI related issues or enhancements labels Nov 2, 2022
@maxfreu
Copy link
Contributor

maxfreu commented Sep 22, 2023

Is there a way to force the complete summary printout?

@ToucheSir
Copy link
Member Author

I don't know off the top of my head, but there might be.

@maxfreu
Copy link
Contributor

maxfreu commented Sep 22, 2023

Here are the timings on my machine (12 core cpu + 1080Ti). Setting @testset verbose=true did the trick :)

NNlib.jl                                          |  164    164  2m30.9s
  CPU                                             |   82     82  1m18.5s
    Upsample                                      |   82     82    59.5s
      upsample_nearest, integer scale via reshape |   11     11    12.0s
      Linear upsampling (1D)                      |    4      4     8.8s
      Bilinear upsampling (2D)                    |   10     10     3.9s
      Trilinear upsampling (3D)                   |    6      6     6.3s
      pixel_shuffle                               |   15     15    21.0s
      Complex-valued upsample                     |   36     36     2.9s
  CUDA                                            |   82     82  1m10.1s
    Upsample                                      |   82     82  1m10.0s
      upsample_nearest, integer scale via reshape |   11     11    20.8s
      Linear upsampling (1D)                      |    4      4     7.0s
      Bilinear upsampling (2D)                    |   10     10     2.6s
      Trilinear upsampling (3D)                   |    6      6     7.0s
      pixel_shuffle                               |   15     15    23.1s
      Complex-valued upsample                     |   36     36     6.2s
     Testing NNlib tests passed

The nearest neighbour and pixel shuffle tests take longer than the others, because there are several tests against forward mode gradients, which take significantly more time. Removing the grad tests from the pixel shuffle tests e.g. reduces the test time to ca 3s. So I think everything is ok, because probably this is mostly precompilation time.

@ToucheSir
Copy link
Member Author

That's still a good amount of time given how small the inputs are, especially given the GHA runners are significantly lower powered. I would've expected pixel shuffle in particular to be fast since most of the functions it calls should already be precompiled. Were you able to determine a rough estimate for how much time is precompilation?

@maxfreu
Copy link
Contributor

maxfreu commented Sep 27, 2023

I slammed a couple of @times in front of the tests, here are the results. I deemed everything else insignificant.

upsample (14s in total):

gradtest 1
 11.290905 seconds (22.42 M allocations: 1.313 GiB, 3.56% gc time, 99.95% compilation time)
gradtest 2
  1.612946 seconds (5.71 M allocations: 312.538 MiB, 5.93% gc time, 99.95% compilation time)

pixel shuffle (38.9s in total):

cat test
  2.284393 seconds (8.25 M allocations: 489.369 MiB, 4.39% gc time, 99.98% compilation time)
gradtest for d=1
 19.207987 seconds (45.40 M allocations: 2.612 GiB, 4.04% gc time, 99.97% compilation time)
gradtest for d=2
  4.439341 seconds (14.02 M allocations: 822.682 MiB, 5.81% gc time, 99.40% compilation time)
gradtest for d=3
 11.875151 seconds (19.70 M allocations: 20.107 GiB, 7.31% gc time, 39.40% compilation time)

So the question is how to make it compile faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CI related issues or enhancements help wanted performance
Projects
None yet
Development

No branches or pull requests

2 participants