-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoidable stalling in the RISC-V example #15
Comments
Good stuff! :) Looping in @threonorm , who wrote the code; I wonder if the same pattern existed in the Bluespec version of the code.
I like this change ^^ |
Yes, we currently stall conservatively for source register (but do not add dependencies for destination registers which are not real destinations). Mainly because the design we started from was doing that (it is coming from a class). I don't have a strong opinion, we can merge that change as it is not important to keep the comparison with the baseline anymore. Here are the things I think we should look at: Maybe we can do a little experimentation for 1) and 2), I think the critical path is currently in decode on the equivalent pattern for the destination register and the scoreboard. Or at least that's what I remember, so it may move to go to the source instead, not sure. For 3), hopefully it should not influence too much, but we have seen weirdness before, so it could be good to confirm it. I think for 4), this could impact the proof we did for the enclave system, but we probably don't want to keep the code in sync anyway (and it is probably out of sync already), so I don't mind ignoring that dimension. I can look at 1, 2, 3 but sadly not before august. |
valid_rs1
andvalid_rs2
are never read in the RISC-V example. Making sure that the values really correspond tors1
orrs2
instead of stalling whenever at least one of them is associated with something other than 0 in the scoreboard yields better performance:I kept only the tests that run for more than 2ms to keep this short. This results in tests taking roughly 25% less time with Cuttlesim and 5% less time for Verilator. Of course, your mileage may vary. I did not check the effects on synthesis.
See related commit 66b59e9.
The text was updated successfully, but these errors were encountered: