-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix post-synthesis simulation in yosys #5
base: master
Are you sure you want to change the base?
Conversation
k is never used. Remove it.
Variable declaration assignments such as reg foo = 1; are interpreted by some tools as reg foo; always @(*) foo = 1; and not as reg foo; initial foo = 1; This can cause simulation-synthesis mismatches, as the resultant value of foo depends on how the tool is feeling. Variable declaration assignments are used in four places. However, the variables in question are subsequently assigned in their entirety, Remove these assignments. Closes: alexforencich#4
Alright, I think I just got a rather significant rewrite of the lfsr module finished up to remove the initial block in favor of constant functions. As part of this rewrite, those initializers were removed, and I think I have cleaned up any unused variables. Please test it out in yosys and let me know if it works any better than the old one. |
For reference, I'm instantiating this like lfsr #(
.LFSR_WIDTH(32),
.LFSR_POLY(32'h04c11db7),
.LFSR_CONFIG("GALOIS"),
.REVERSE(1),
.DATA_WIDTH(8),
.STYLE("REDUCTION")
) crc (
.data_in(data),
.state_in(crc_state),
.state_out(crc_state_out)
); Your changes work, but now the module takes far longer to compile. It's also much noisier. E.g.
vs
Most of the delay follows the above output, but it seems like there is some O(n) (or worse) algorithm and n has increased with this new style. |
Well, to some extent taking longer is expected, since the LFSR computation is run once per output bit. Unfortunately, I don't think there is a way around that since you can't return a 2D array in Verilog, and I think packing the whole matrix into a single dimension would significantly limit the scalability since then you can run into wire size limits and what not. It sounds like maybe yosys isn't handling the constant evaluation very well. Like it's building a netlist for the whole function, and then optimizing it all away, instead of simply evaluating the function and using the result. |
For reference, I go from
To
That sounds about right.
|
TBH, I don't care all that much about build times, mine are usually on the order of an hour or so, so an extra 30 seconds is irrelevant. The quality of the result is more important, as well as portability (tool and device support) in general. However, simulation time is an important concern, and the new LFSR is a bit slower in that respect (I think I timed the axis_xgmii_tx_64 module to 15 seconds with the new version, and 13 seconds with the old version). But I think that's a reasonable trade-off if the tool support is better - specifically, the old one doesn't work with synplify as it ignores the initial block completely; I think the new version should work but I do not have access to synplify to test it out. A rewrite in SV may improve the performance because it has better support for multidimensional arrays, but then it won't work in ISE. |
I care very much about build times. I regularly run post-synthesis tests on modules to make sure I don't have simulation/synthesis mismatches (and in fact that's how I caught this bug). A tenfold increase in compilation time is unacceptable for me.
In fact, this new version uses 6 additional LUTs: 76 vs 70 for "REDUCTION". In both cases, "LOOP" uses 80 LUTs. You can try this yourself with the following yosys commands: read_verilog rtl/lfsr.v
chparam -set LFSR_WIDTH 32 -set LFSR_POLY 32'h04c11db7 -set LFSR_CONFIG "GALOIS" -set REVERSE 1 -set DATA_WIDTH 8 -set STYLE "REDUCTION" lfsr
synth_ice40
I see no major difference in simulation time with iverilog. |
Well, if you can think of a better way to implement this functionality in a way that works in all of the major tools (particularly Vivado, ISE, Quartus, Quartus Pro, and Synplify) and in all of the major simulators, then I can certainly rework it again. It's strange that it uses more LUTs in yosys, I didn't see a significant difference in the testing that I was doing with other tools. Unfortunately, the place and route process is quite "chaotic" and as a result a design with trivial changes often results in significant differences in resource utilization, placement, etc. Even different timing constraints can change the resource consumption. It's a bit unclear what is going on in this case, but I think there is a good chance that it has something to do with some of the optimization passes in yosys producing significantly different results based on some seemingly insignificant structural difference in the input. I will say that some tools seem to handle the "reduction" style better, and some tools seem to do better with the "loop" style, and it really isn't clear to me why there would be any difference at all - either way, it's just a bunch of XOR gates, and those can be reorganized in all sorts of ways without changing the functionality. I would think that the reduction style should be more clear to the tools what the intent is and produce a more compact result, but several tools seem to produce a smaller result with the loop style. Maybe the tools need more direction in terms of explicitly laying out some sort of cascade of stages, that seems to be the approach taken by these guys: https://github.com/QianfengClarkShen/Tbps_CRC . I just discovered that repo recently, there are probably some ideas there that can be incorporated here. In terms of sim time, it looks like in your case you're using it to compute a CRC 8 bits at a time, which is much simpler than the axis_xgmii_tx_64 module I was testing in sim. That module has 8 LFSR instances, one each for 1 through 8 bytes. Initializing all of that takes a decent chunk of time, and definitely seems to be causing the simulation to take an extra second or two to start up. The initial rewrite I did was much slower, but some reworking and simplifying of the bit-reverse code sped it up quite a bit. |
As noted in #4, there are problems with synthesis in yosys. Whether or not this is a bug in yosys or not, these assignments are unnecessary, so we can remove them.
I have also included a small cleanup commit as well.