Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation performance degrades rapidly while increasing record length #12

Open
dredozubov opened this issue Nov 21, 2017 · 6 comments

Comments

@dredozubov
Copy link
Contributor

Compilation performance is evident on records of bigger size, just one record of 30-50 fields is able to make GHC swap. There's a branch and a script I've used to dig more info on this:
https://github.com/dredozubov/superrecord/tree/ghc-test-case
https://github.com/dredozubov/superrecord/blob/ghc-test-case/build-all.sh

I did a small investigation on this. Tried it with GHC 8.0.2 and 8.2.1(it can be done with allow-newer). Building it with GHC HEAD is not currently possible due to the broken dependencies. Here we go:
https://github.com/dredozubov/superrecord/blob/ghc-test-case/test/Spec.hs.in#L451 construction of a record with 35 fields take up to 2 minutes

!!! Chasing dependencies: finished in 11.20 milliseconds, allocated 15.786 megabytes
!!! Parser [Spec]: finished in 1.62 milliseconds, allocated 2.921 megabytes
!!! Renamer/typechecker [Spec]: finished in 5053.19 milliseconds, allocated 4453.209 megabytes
!!! Desugar [Spec]: finished in 8061.64 milliseconds, allocated 12993.168 megabytes
!!! Simplifier [Spec]: finished in 9113.55 milliseconds, allocated 10165.327 megabytes
!!! Specialise [Spec]: finished in 12565.69 milliseconds, allocated 8482.198 megabytes
                   OverSatApps = False}) [Spec]: finished in 12262.97 milliseconds, allocated 13919.365 megabytes
!!! Simplifier [Spec]: finished in 42090.28 milliseconds, allocated 42521.308 megabytes
!!! Simplifier [Spec]: finished in 17468.49 milliseconds, allocated 16998.313 megabytes
!!! Simplifier [Spec]: finished in 21349.30 milliseconds, allocated 32972.807 megabytes
!!! Float inwards [Spec]: finished in 804.30 milliseconds, allocated 1908.825 megabytes
!!! Called arity analysis [Spec]: finished in 1986.35 milliseconds, allocated 1895.883 megabytes
!!! Simplifier [Spec]: finished in 1644.02 milliseconds, allocated 2605.471 megabytes
!!! Demand analysis [Spec]: finished in 870.99 milliseconds, allocated 1954.719 megabytes
!!! Worker Wrapper binds [Spec]: finished in 1924.12 milliseconds, allocated 2049.162 megabytes
!!! Simplifier [Spec]: finished in 3616.14 milliseconds, allocated 3650.656 megabytes
                   OverSatApps = True}) [Spec]: finished in 1670.04 milliseconds, allocated 3925.652 megabytes
!!! Common sub-expression [Spec]: finished in 857.66 milliseconds, allocated 1020.030 megabytes
!!! Float inwards [Spec]: finished in 634.74 milliseconds, allocated 1013.631 megabytes
!!! Simplifier [Spec]: finished in 2484.37 milliseconds, allocated 2723.176 megabytes
!!! CoreTidy [Spec]: finished in 254.93 milliseconds, allocated 405.708 megabytes
!!! CorePrep [Spec]: finished in 163.65 milliseconds, allocated 463.810 megabytes
!!! CodeGen [Spec]: finished in 425.96 milliseconds, allocated 474.554 megabytes

RecCopy produces a huge amount of coercions: https://gist.github.com/dredozubov/6ce629d1ec16ac32e9a987ffefda2a2a
There's a lot of rewriting going on(a lot of heavy inlining in the library). Here are some core2core dumps with their respective size:

% ls -ls dump-35/test/Spec.verbose-core2core.split/ | tail -14 | awk '{print $1, $10}'
8 S-00
13264 S.01-simplifier
49272 S.02-levels-added
5976 S.03-float-out
42720 S.04-simplifier
41784 S.05-simplifier
8416 S.06-simplifier
8416 S.07-float-inwards
8416 S.08-simplifier
8472 S.09-simplifier
12072 S.10-levels-added
1936 S.11-float-out
1944 S.12-float-inwards
1832 S.13-simplifier

These dumps are too big to add as gists, so I've uploaded this tar archive instead: https://www.dropbox.com/s/pql4qgm5lplbe38/dump-35.tar.gz?dl=0

It's possible to rebuild all of this with nix-shell -p python3 --run "ghc=/Users/dr/.stack/programs/x86_64-osx/ghc-8.0.2/bin/ghc m=25 n=35 ./build-all.sh -package-db ~/.stack/snapshots/x86_64-osx/lts-8.20/8.0.2/pkgdb/", skipping the nix-shell bit if you have python3 installed on your system and substituting the ghc variable and package-db with correct values for you system. m and n variables mean it'll rebuild the module 10 with records of length 25 to 35.

@vagarenko
Copy link

I've encountered this too. Compilation time for large records makes this library unusable :(

@dredozubov dredozubov changed the title Complilation performance degrades rapidly with increasing record size Compilation performance degrades rapidly while increasing record length Nov 21, 2017
@Wizek
Copy link

Wizek commented Feb 16, 2018

Maybe the slowdown has to do something with SuperRecord.SortInsert?

@agrafix
Copy link
Owner

agrafix commented Feb 16, 2018

No, the slowdown also exists w/o the SortInsert :-(

@reactormonk
Copy link

reactormonk commented Apr 2, 2018

When disabling optimization stack build --fast, a record of 20 elements takes about 8 seconds to compile on my laptop, as opposed to ca. 100s when fully optimizing.

@jvanbruegge
Copy link

jvanbruegge commented Nov 8, 2018

I am currently writing a similar library, and I benchmarked against superrecord (just the compile time)
The file that gets compiled is here (just creates a record with 40 entries): https://gist.github.com/jvanbruegge/e2297f8e57e783f845f56f0627afc7ba
Results on my laptop for superrecord:

stack build  1054,20s user 15,07s system 98% cpu 18:06,09 total

and my library:

stack build  37,06s user 2,83s system 88% cpu 45,272 total

I am not sure where this difference comes from, my rows are also sorted and the runtime representation of records is also a SmallArray#

@jvanbruegge
Copy link

jvanbruegge commented Nov 8, 2018

I figured that the order of labels was the best case for my RowAppend type family as it would be a O(1) cons each time (making it O(n)). But even after reversing order of labels which is the O(n²) worst case it was still way faster:

stack build  123,89s user 9,02s system 95% cpu 2:19,09 total

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants