Skip to content

ecp5: Make -abc9 the default #1544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

daveshah1
Copy link

abc9 gives substantially better area on almost any non-trivial design by virtue of using wide LUTs in a vaguely sensible way. Several third party frameworks like LiteX have been using it as default for a while, so it seems time for it to be the default for everyone - and give them a better impression of the capabilities of the open source tools!

Fixes #1323

@daveshah1 daveshah1 requested a review from eddiehung December 3, 2019 10:21
@whitequark
Copy link
Member

What about iCE40?

@daveshah1
Copy link
Author

I don't consider myself the "owner" of synth_ice40 enough to make such a decision, but if there is a consensus for that I'm happy to do that too.

@whitequark
Copy link
Member

I'm very much in favor of doing the switch for iCE40 as well, since there is no downside that I know of, and the QoS improves massively.

@whitequark
Copy link
Member

Regarding third party users: nMigen currently does not add -abc9 on its own, both as a matter of policy (the policy being "nMigen does not add any toolchain options essential for correct synthesis"), and because it breaks nMigen used together e.g. with Yosys in Debian stable, which I got an actual report for IIRC.

So this PR (and any follow-up for iCE40) would benefit all developers using nMigen who do not already use -abc9, where my experience is that few people are aware that it exists and makes a huge difference.

Copy link
Collaborator

@eddiehung eddiehung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a number of concerns with making this default:

@daveshah1
Copy link
Author

I have a number of concerns with making this default:

* `abc9` is still currently in flux and is far from stable in terms of features, attributes, etc.

Quite frankly this applies to the whole of the Yosys, nextpnr and trellis flow....

* `synth_ecp5 -abc9` has timing numbers (and thus optimises) only for the 5G variant

In practice the lack of any absolute timing constraints and the fact each variant is proportionately similar means this is a small issue, I can add exact per-device numbers. This also doesn't change abc9 being a strict improvement over abc for any device.

* Not all arrival timings are present, e.g.:
  
  * `TRELLIS_DPR16X4` https://github.com/YosysHQ/yosys/blob/2ec6d832dc07a68157721715785d469feabbb6ed/techlibs/ecp5/cells_sim.v#L118
  * `TRELLIS_RAM16X2` https://github.com/YosysHQ/yosys/blob/2ec6d832dc07a68157721715785d469feabbb6ed/techlibs/ecp5/cells_sim.v#L71
  * https://github.com/YosysHQ/yosys/blob/2ec6d832dc07a68157721715785d469feabbb6ed/techlibs/ecp5/cells_sim.v#L220
  * etc.?

Noted, will add these - TRELLIS_RAM16X2 is a sim-only primitive that isn't directly instantiable, but the EBR primitive probably needs them too.

* Are all box timings present?

Other than the missing arrival times above; yes

@eddiehung
Copy link
Collaborator

eddiehung commented Dec 4, 2019

Quite frankly this applies to the whole of the Yosys, nextpnr and trellis flow....

Indeed. But why add more chaos on top? I'm well aware that abc9 is currently a bit of a mess and I accept that's entirely on me (a refactoring is on the roadmap once it's feature complete), so I'm careful about taking on the load of supporting users that expect it to work out of the box, versus developers who appreciate and accept the risk of using this experimental feature. The resolution to this would be (a) other maintainers (to whom I would apologise profusely for the state of the code), or (b) more time for me to mature it further.

Off the top of my head, abc9 still needs:

  • Sequential synthesis
  • Required time support
  • Method to override abc9 script without breaking apart synth_* (currently thinking of using the scratchpad)
  • More upstream fixes

which will also likely bring some more short-term breakage.

In practice the lack of any absolute timing constraints and the fact each variant is proportionately similar means this is a small issue, I can add exact per-device numbers. This also doesn't change abc9 being a strict improvement over abc for any device.

I understand that this is difficult, and there are many factors why it won't provide a big benefit, but users should be made aware of that.

TRELLIS_RAM16X2 is a sim-only primitive that isn't directly instantiable

Can you add a comment to this effect please?

Regarding arrival times and box timings, what about these guys?

module DPR16X4C (
input [3:0] DI,
input WCK, WRE,
input [3:0] RAD,
input [3:0] WAD,
output [3:0] DO
);

module TRELLIS_SLICE(
input A0, B0, C0, D0,
input A1, B1, C1, D1,
input M0, M1,
input FCI, FXA, FXB,
input CLK, LSR, CE,
input DI0, DI1,
input WD0, WD1,
input WAD0, WAD1, WAD2, WAD3,
input WRE, WCK,
output F0, Q0,
output F1, Q1,
output FCO, OFX0, OFX1,
output WDO0, WDO1, WDO2, WDO3,
output WADO0, WADO1, WADO2, WADO3
);

@whitequark
Copy link
Member

This also doesn't change abc9 being a strict improvement over abc for any device.

@eddiehung I would like to emphasize this part as much as possible. The QoS with plain abc is so low, that (IIRC) I got only slightly worse results by straightforwardly mapping fine gates to LUTs and then greedily merging LUTs, and slightly better results with my essentially proof of concept flowmap pass. I think the QoS in the default workflow makes Yosys look bad for no particularly good reason.

If your view is that abc9 requires major work to be usable as a default that will not happen soon, and abc isn't going to be improved at all, then I propose making flowmap the new default flow in the interim, and incrementally adding features currently exclusive to abc9 to it (non-unit delay model, whiteboxes, ...). The core pass is in a good shape right now, and any additions can be vetted and enabled one by one instead of in a single major change.

@eddiehung
Copy link
Collaborator

This also doesn't change abc9 being a strict improvement over abc for any device.

@eddiehung I would like to emphasize this part as much as possible.

Indeed. My argument is that robustness is something that should also be taken into account, and I'm personally not fully convinced that it is mature enough. If others are and are willing to help maintain it as I continue to add new features then I would feel a lot more comfortable. There are some areas which abc9 will continue to underperform with respect to abc, namely retiming.

The QoS with plain abc is so low, that (IIRC) I got only slightly worse results by straightforwardly mapping fine gates to LUTs and then greedily merging LUTs, and slightly better results with my essentially proof of concept flowmap pass. I think the QoS in the default workflow makes Yosys look bad for no particularly good reason.

If your view is that abc9 requires major work to be usable as a default that will not happen soon

I wouldn't go as far as to say not soon. It is probably my top item but it is slow progress and I am being stretched thin so I don't want to make any promises I can't keep.

and abc isn't going to be improved at all, then I propose making flowmap the new default flow in the interim, and incrementally adding features currently exclusive to abc9 to it (non-unit delay model, whiteboxes, ...). The core pass is in a good shape right now, and any additions can be vetted and enabled one by one instead of in a single major change.

I have no objection to that, nor am I involved in flowmap's development.

@daveshah1
Copy link
Author

The main reason flowmap beats abc is it maps no wide LUTs, only LUT4s whereas abc1 is obscenely wasteful at using wide LUTs. Adding -nowidelut would probably get a similar area gain but at the expense of a 10-15% Fmax drop.

I do agree, if abc9 is unlikely to be stable for the foreseeable future, I'd rather put time into flowmap and use that too.

@whitequark
Copy link
Member

The main reason flowmap beats abc is it maps no wide LUTs, only LUT4s whereas abc1 is obscenely wasteful at using wide LUTs.

I'm a bit confused here--flowmap has a -maxlut k option so it should map LUTs up to that many inputs. Is there something else to that?

@daveshah1
Copy link
Author

flowmap would also need to know that a LUT5 costs twice the area as a a LUT4, for example (this is the limit of abc1), and also ideally know that a LUT5 is slower than a LUT4 (this is what abc9 adds)

@daveshah1
Copy link
Author

In order to see a flowmap based flow as the default for ECP5, I'd like to see a feature set roughly similar to abc9 - I don't think any of this is too much work though, I may well even be able to get much of it done before abc9 is stable.

  • trivial whitebox aware logic optimisation (freduce would probably be the basis of this)
  • LUT library support with per-input delays per LUT type rather than a single unit delay
  • whitebox delay aware mapping
  • some kind of area recovery (I think this is a WIP right now?)

Nice to haves would be

  • more sensible whitebox mapping
  • use of LUT4+mux structure in a better way than just a LUT5 (eg mapping some functions of up to 9 inputs to it)
  • retiming

@whitequark
Copy link
Member

In order to see a flowmap based flow as the default for ECP5, I'd like to see a feature set roughly similar to abc9

We're in agreement here. I'm happy to discuss any improvements to it whever it's convenient. I'll be able to work on flowmap myself once I finish my current simulator work--that is, cxxsim and improved concurrency semantics for nMigen pysim.

@eddiehung
Copy link
Collaborator

flowmap would also need to know that a LUT5 costs twice the area as a a LUT4, for example (this is the limit of abc1), and also ideally know that a LUT5 is slower than a LUT4 (this is what abc9 adds)

LUT library support with per-input delays per LUT type rather than a single unit delay

For the record, abc[1] supports the same *.lut file as abc9 -- it's just not used right now. There's no public information on how to use the same *.box files (delay propagation through white/black-boxes) though, if indeed it does support doing so.

@daveshah1
Copy link
Author

Closing this for now while we consider the best way forward.

@daveshah1 daveshah1 closed this Dec 7, 2019
@daveshah1 daveshah1 deleted the dave/ecp5-abc9-default branch January 18, 2020 09:49
@Ravenslofty
Copy link
Collaborator

@eddiehung

Off the top of my head, abc9 still needs:

  • Sequential synthesis

This was added in #1181.

  • Required time support

This was added in #1661.

  • Method to override abc9 script without breaking apart synth_* (currently thinking of using the scratchpad)

This was added in #1582.

  • More upstream fixes

Not sure about those, but they should surely be added by now (even considering the glacial pace of ABC development).

With all this in mind, would you still be opposed to making ABC9 the default for ECP5 (and possibly Xilinx, but not iCE40, considering #1792)?

@eddiehung
Copy link
Collaborator

eddiehung commented Apr 6, 2020

  • Sequential synthesis

This was added in #1181.

I would correct that statement and say that the raw capability for architectures to support sequential synthesis was added by #1181, but so far that capability is only realised for Xilinx.
I'm well aware that it is a bit of a manual PITA to add this for other arches, so on my roadmap is to make the whole thing automatic but that involves adding some other Yosys enhancements.
While this isn't critical, I'd like to try and achieve feature parity with the existing abc flow before superseding it with abc9.

  • More upstream fixes

Not sure about those, but they should surely be added by now (even considering the glacial pace of ABC development).

Specifically, there are two known upstream issues.
One which prunes away the wrong flops under sequential synthesis with multi clock domains, to which I proposed a fix here: berkeley-abc/abc#65
Another is that ABC's &mfs still occasionally fires an ABC assertion, and this has been reported upstream too.
Of course, both of these can be worked-around, but equally, if things aren't fully working to the best of my knowledge why push it?

With all this in mind, would you still be opposed to making ABC9 the default for ECP5 (and possibly Xilinx, but not iCE40, considering #1792)?

Without intending to sound rude: please help me understand why you're pushing for it to become default? My concern is that making it default will only cause a wider set of users to hit some of the known issues above, to which the rest of us will have to suggest the workaround is to go back to abc.

@Ravenslofty
Copy link
Collaborator

  • Sequential synthesis

This was added in #1181.

I would correct that statement and say that the raw capability for architectures to support sequential synthesis was added by #1181, but so far that capability is only realised for Xilinx.

Okay, but my general impression is that most people don't know -retime even exists. Sure, it's nice to have, and useful, but if people don't know it exists they are no worse off for it. Even when it's used it's generally not as effective as just running nextpnr with a new seed.

I'm well aware that it is a bit of a manual PITA to add this for other arches, so on my roadmap is to make the whole thing automatic but that involves adding some other Yosys enhancements.

That'd be nice.

While this isn't critical, I'd like to try and achieve feature parity with the existing abc flow before superseding it with abc9.

I really don't think it's critical at all to be honest. My general experience with ABC versus ABC9 is that ABC is better (in Fmax/area) when targeting LUT4s, while ABC9 handles larger LUTs more efficiently; that makes them complimentary, rather than competitive.

  • More upstream fixes

Not sure about those, but they should surely be added by now (even considering the glacial pace of ABC development).

Specifically, there are two known upstream issues.
One which prunes away the wrong flops under sequential synthesis with multi clock domains, to which I proposed a fix here: berkeley-abc/abc#65
Another is that ABC's &mfs still occasionally fires an ABC assertion, and this has been reported upstream too.
Of course, both of these can be worked-around, but equally, if things aren't fully working to the best of my knowledge why push it?

Because at the rate of ABC's development, they won't be fully working this time next year either >.>

With all this in mind, would you still be opposed to making ABC9 the default for ECP5 (and possibly Xilinx, but not iCE40, considering #1792)?

Without intending to sound rude: please help me understand why you're pushing for it to become default? My concern is that making it default will only cause a wider set of users to hit some of the known issues above, to which the rest of us will have to suggest the workaround is to go back to abc.

Sure, and there's the risk of unknown bugs lurking in there, too. But for ECP5 (at least in the default wide-LUT mode) there is a very distinct advantage to using ABC9 over ABC in Fmax and area, and for a lot of the users that will be the entirety of their Yosys experience. It'd be nice to give ABC9 a "fallback to ABC" mode for the rare case of ABC9 bugs, but my general opinion is that the present number of bugs is in no way an indicator of future bugs, and if really necessary there's the option of maintaining a small ABC downstream.

I'm not arguing this for the sake of the power users, I'm arguing it for the majority who don't even know what ABC9 does.

@eddiehung
Copy link
Collaborator

  • Sequential synthesis

This was added in #1181.

I would correct that statement and say that the raw capability for architectures to support sequential synthesis was added by #1181, but so far that capability is only realised for Xilinx.

Okay, but my general impression is that most people don't know -retime even exists. Sure, it's nice to have, and useful, but if people don't know it exists they are no worse off for it. Even when it's used it's generally not as effective as just running nextpnr with a new seed.

There's -dff (which gives flops to ABC and asks it to use information gleaned from reachable states to compute and merge identical signals) and there's -retime which actually moves the flops.

  • More upstream fixes

Not sure about those, but they should surely be added by now (even considering the glacial pace of ABC development).

Specifically, there are two known upstream issues.
One which prunes away the wrong flops under sequential synthesis with multi clock domains, to which I proposed a fix here: berkeley-abc/abc#65
Another is that ABC's &mfs still occasionally fires an ABC assertion, and this has been reported upstream too.
Of course, both of these can be worked-around, but equally, if things aren't fully working to the best of my knowledge why push it?

Because at the rate of ABC's development, they won't be fully working this time next year either

And unfortunately that's out of my hands. I can't in good faith bless something that has known bugs, particularly one that can give a incorrect result (and which took me a significant amount of time to debug). Others can pull the trigger if they so wish, but there's a limit to what I can support.

With all this in mind, would you still be opposed to making ABC9 the default for ECP5 (and possibly Xilinx, but not iCE40, considering #1792)?

Without intending to sound rude: please help me understand why you're pushing for it to become default? My concern is that making it default will only cause a wider set of users to hit some of the known issues above, to which the rest of us will have to suggest the workaround is to go back to abc.

Sure, and there's the risk of unknown bugs lurking in there, too. But for ECP5 (at least in the default wide-LUT mode) there is a very distinct advantage to using ABC9 over ABC in Fmax and area, and for a lot of the users that will be the entirety of their Yosys experience.

I'm afraid I disagree here. I would say a lot of users are new to Yosys, and it's those users -- non power users -- that I want to provide something that works. That blinks. That works out of the box, for hobbyists and teachers and students. They aren't trying to push their designs to minimise delay or area. Those who are ready to climb outside of the sandbox can then start playing with more advanced and experimental features.

It'd be nice to give ABC9 a "fallback to ABC" mode for the rare case of ABC9 bugs, but my general opinion is that the present number of bugs is in no way an indicator of future bugs, and if really necessary there's the option of maintaining a small ABC downstream.

I'm not arguing this for the sake of the power users, I'm arguing it for the majority who don't even know what ABC9 does.

I think we agree that no-one needs to know what ABC9 does, and they will never need to if it works out of the box. That's where I'm trying to drive this towards.

@whitequark
Copy link
Member

whitequark commented Apr 7, 2020

@eddiehung I see your arguments, but the absolute minimum that I believe should be done is indicating somewhere that for advanced use, -abc9 should be considered. When I was working on my PCIe stack, the ECP5 synthesis results without ABC9 were so abysmal (for example #1323; the results with non-synthetic code were similarly discouraging) that I did almost all of that work using Lattice Diamond. If I was aware that using ABC9 was an option, I would not have ended up telling a few people that for now, they should avoid the FOSS stack for this kind of work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On ECP5, a 26-bit counter synthesizes to 271 LUTs (without -abc9)
5 participants