Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance methods without INLINE pragma blow up multiplexer usage #657

Open
gergoerdi opened this issue Jun 30, 2019 · 6 comments
Open

Instance methods without INLINE pragma blow up multiplexer usage #657

gergoerdi opened this issue Jun 30, 2019 · 6 comments

Comments

@gergoerdi
Copy link
Contributor

It has finally happened. I guess it was inevitable, but today I finally got the following error message from my FPGA vendor's synthesis tool:

ERROR:Place:375 - The design does not fit in device.
    Total LUT Utilization      : 14340 out of 11440
    LUTs used as Logic         : 14340
    LUTs used as Memory        : 0
    FF Utilization             : 304 out of 11440

This is with a version of Space Invaders so "morally" it should be tiny: an Intel 8080, a VGA signal generator backed by a frame buffer, a shift register to buffer video reads, and a barrel shifter as used on the original arcade machine.

So I guess my question is, what approaches can one take to figure out where all the resource usage is coming from? The generated Verilog is quite inscrutable for me, so I wouldn't even know where to start; but from the ten-thousand foot view my hunch is that some Haskell abstractions are not properly compiled away by Clash.

The full list of feature resources for Space Invaders is as follows. Note the huge number of (especially 1-bit) 2-to-1 multiplexers.

# RAMs                                                 : 3
 1024x8-bit dual-port block RAM                        : 1
 7168x8-bit dual-port block RAM                        : 1
 8192x8-bit single-port block Read Only RAM            : 1
# Adders/Subtractors                                   : 96
 1-bit adder                                           : 18
 1-bit adder carry in                                  : 12
 1-bit subtractor                                      : 1
 10-bit subtractor                                     : 1
 13-bit subtractor                                     : 2
 16-bit adder                                          : 18
 16-bit subtractor                                     : 7
 17-bit adder                                          : 2
 2-bit adder                                           : 3
 3-bit adder                                           : 1
 4-bit adder                                           : 2
 5-bit adder                                           : 8
 5-bit subtractor                                      : 8
 8-bit adder                                           : 4
 8-bit subtractor                                      : 2
 9-bit adder                                           : 4
 9-bit subtractor                                      : 3
# Counters                                             : 8
 10-bit up counter                                     : 8
# Registers                                            : 288
 Flip-Flops                                            : 288
# Comparators                                          : 38
 1-bit comparator not equal                            : 3
 10-bit comparator greater                             : 12
 10-bit comparator lessequal                           : 6
 16-bit comparator greater                             : 3
 16-bit comparator lessequal                           : 5
 2-bit comparator greater                              : 2
 32-bit comparator greater                             : 1
 4-bit comparator greater                              : 6
# Multiplexers                                         : 2106
 1-bit 2-to-1 multiplexer                              : 1642
 1-bit 8-to-1 multiplexer                              : 2
 10-bit 2-to-1 multiplexer                             : 71
 10-bit 7-to-1 multiplexer                             : 2
 11-bit 2-to-1 multiplexer                             : 6
 12-bit 2-to-1 multiplexer                             : 3
 13-bit 2-to-1 multiplexer                             : 1
 14-bit 2-to-1 multiplexer                             : 9
 15-bit 2-to-1 multiplexer                             : 2
 16-bit 2-to-1 multiplexer                             : 1
 17-bit 2-to-1 multiplexer                             : 63
 19-bit 2-to-1 multiplexer                             : 4
 2-bit 2-to-1 multiplexer                              : 127
 212-bit 42-to-1 multiplexer                           : 2
 212-bit 8-to-1 multiplexer                            : 1
 22-bit 2-to-1 multiplexer                             : 5
 22-bit 4-to-1 multiplexer                             : 1
 25-bit 2-to-1 multiplexer                             : 1
 3-bit 2-to-1 multiplexer                              : 4
 31-bit 2-to-1 multiplexer                             : 2
 32-bit 2-to-1 multiplexer                             : 20
 33-bit 2-to-1 multiplexer                             : 20
 4-bit 2-to-1 multiplexer                              : 8
 5-bit 2-to-1 multiplexer                              : 10
 7-bit 2-to-1 multiplexer                              : 7
 8-bit 2-to-1 multiplexer                              : 57
 8-bit 3-to-1 multiplexer                              : 2
 8-bit 4-to-1 multiplexer                              : 1
 8-bit 8-to-1 multiplexer                              : 9
 9-bit 2-to-1 multiplexer                              : 23
# Logic shifters                                       : 1
 32-bit shifter logical left                           : 1
# Xors                                                 : 3
 1-bit xor8                                            : 1
 8-bit xor2                                            : 2
@leonschoorl
Copy link
Member

If you can get your synthesis software to tell you where in the HDL these resources are being used:
Christiaan has been working on a pull request that makes clash annotate it's generated HDL with Haskell source locations.
That might should help you map HDL code locations to Haskell code locations.

You could also try the flag -fclash-compile-ultra, it enables some extra (expensive) optimizations.

@gergoerdi
Copy link
Contributor Author

I have applied grit where my smarts have failed me, and started bisecting my code across various directions, and recording the number of multiplexers. I have already found something interesting: in this commit the only change is replacing some typeclass methods with direct function calls, and that change alone removes 70% of the multiplexers!

In fact, going back to my original code and adding INLINE pragmas on these instance methods, the resource usage goes down into the following:

Macro Statistics
# RAMs                                                 : 3
 1024x8-bit dual-port RAM                              : 1
 7168x8-bit dual-port RAM                              : 1
 8192x8-bit single-port Read Only RAM                  : 1
# Adders/Subtractors                                   : 114
 10-bit adder                                          : 8
 10-bit subtractor                                     : 4
 16-bit adder                                          : 16
 16-bit subtractor                                     : 10
 17-bit adder                                          : 2
 2-bit adder                                           : 3
 3-bit adder                                           : 1
 4-bit adder                                           : 44
 5-bit adder                                           : 8
 5-bit subtractor                                      : 8
 8-bit adder                                           : 2
 8-bit subtractor                                      : 2
 9-bit adder                                           : 6
# Registers                                            : 223
 1-bit register                                        : 202
 10-bit register                                       : 8
 16-bit register                                       : 2
 18-bit register                                       : 1
 3-bit register                                        : 2
 5-bit register                                        : 2
 8-bit register                                        : 6
# Comparators                                          : 38
 1-bit comparator not equal                            : 3
 10-bit comparator greater                             : 12
 10-bit comparator lessequal                           : 6
 16-bit comparator greater                             : 3
 16-bit comparator lessequal                           : 5
 2-bit comparator greater                              : 2
 32-bit comparator greater                             : 1
 4-bit comparator greater                              : 6
# Multiplexers                                         : 773
 1-bit 2-to-1 multiplexer                              : 181
 1-bit 8-to-1 multiplexer                              : 2
 10-bit 2-to-1 multiplexer                             : 79
 10-bit 7-to-1 multiplexer                             : 2
 11-bit 2-to-1 multiplexer                             : 6
 12-bit 2-to-1 multiplexer                             : 3
 13-bit 2-to-1 multiplexer                             : 2
 14-bit 2-to-1 multiplexer                             : 9
 15-bit 2-to-1 multiplexer                             : 2
 16-bit 2-to-1 multiplexer                             : 1
 17-bit 2-to-1 multiplexer                             : 63
 19-bit 2-to-1 multiplexer                             : 4
 2-bit 2-to-1 multiplexer                              : 132
 212-bit 42-to-1 multiplexer                           : 2
 212-bit 8-to-1 multiplexer                            : 1
 22-bit 2-to-1 multiplexer                             : 5
 22-bit 4-to-1 multiplexer                             : 1
 25-bit 2-to-1 multiplexer                             : 1
 3-bit 2-to-1 multiplexer                              : 4
 31-bit 2-to-1 multiplexer                             : 2
 32-bit 2-to-1 multiplexer                             : 20
 33-bit 2-to-1 multiplexer                             : 20
 4-bit 2-to-1 multiplexer                              : 10
 5-bit 2-to-1 multiplexer                              : 10
 7-bit 2-to-1 multiplexer                              : 7
 8-bit 2-to-1 multiplexer                              : 169
 8-bit 3-to-1 multiplexer                              : 2
 8-bit 4-to-1 multiplexer                              : 1
 8-bit 8-to-1 multiplexer                              : 9
 9-bit 2-to-1 multiplexer                              : 23
# Logic shifters                                       : 1
 32-bit shifter logical left                           : 1
# Xors                                                 : 3
 1-bit xor8                                            : 1
 8-bit xor2                                            : 2

@gergoerdi gergoerdi changed the title Suggestions for getting a grasp on component usage? Instance methods without INLINE pragma blow up multiplexer usage Jul 2, 2019
@gergoerdi
Copy link
Contributor Author

Unfortunately, it is still not small enough:

Total LUT Utilization      : 13392 out of 11440

@gergoerdi
Copy link
Contributor Author

I have tried -fclash-compile-ultra but it doesn't seem to change anything.

gergoerdi added a commit to gergoerdi/clash-spaceinvaders that referenced this issue Jul 6, 2019
gergoerdi added a commit to gergoerdi/clash-spaceinvaders that referenced this issue Jul 6, 2019
gergoerdi added a commit to gergoerdi/clash-intel8080 that referenced this issue Jul 20, 2019
@gergoerdi
Copy link
Contributor Author

Is the original problem here resolved by #681?

@christiaanb
Copy link
Member

christiaanb commented Jul 30, 2019

@gergoerdi If it does, that was not the intention behind #681; its purpose was to make Clash stop complaining to raise the inline limit when it needs to inline e.g. a Num dictionary for the 21st time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants