Potential optimization in the RISC-V example #16

mbty · 2021-08-23T15:02:36Z

I noticed that the getFields function of the RISC-V example often gets called just to get the value of a single field (with calls along the line of let funct3 := get(getFields(inst), funct3)). Since fetching the values of all the fields just to extract the value of one of them looked a bit overkill, I tried replacing getFields with individual getXxx functions (e.g. getFunct3). This provides a decent boost to simulation performance:

-- Running tests with Cuttlesim --
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_qsort.rv32 -1
[before] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
[after ] real: 0m0.013s	user: 0m0.013s	sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_median.rv32 -1
[before] real: 0m0.012s	user: 0m0.012s	sys: 0m0.000s
[after ] real: 0m0.008s	user: 0m0.007s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/img.rv32 -1
[before] real: 0m0.216s	user: 0m0.195s	sys: 0m0.020s
[after ] real: 0m0.160s	user: 0m0.142s	sys: 0m0.018s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/primes.rv32 -1
[before] real: 0m4.421s	user: 0m4.417s	sys: 0m0.001s
[after ] real: 0m3.090s	user: 0m3.084s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/morse.rv32 -1
[before] real: 0m1.584s	user: 0m1.583s	sys: 0m0.000s
[after ] real: 0m1.086s	user: 0m1.085s	sys: 0m0.000s
-- Running tests with Verilator --
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_qsort.vmh -1
[before] real: 0m0.033s	user: 0m0.032s	sys: 0m0.001s
[after ] real: 0m0.030s	user: 0m0.030s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/morse.vmh -1
[before] real: 0m2.689s	user: 0m2.687s	sys: 0m0.000s
[after ] real: 0m2.186s	user: 0m2.184s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_median.vmh -1
[before] real: 0m0.022s	user: 0m0.022s	sys: 0m0.000s
[after ] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/primes.vmh -1
[before] real: 0m8.358s	user: 0m8.349s	sys: 0m0.001s
[after ] real: 0m6.778s	user: 0m6.767s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/img.vmh -1
[before] real: 0m0.399s	user: 0m0.385s	sys: 0m0.014s
[after ] real: 0m0.327s	user: 0m0.310s	sys: 0m0.017s

This results in a speedup of roughly 1.43 for Cuttlesim and of 1.23 for Verilator compared to master. I did not yet check the effects on synthesis but I might do it soon. I assume that this kind of optimization is applied automatically when synthesizing (isn't it?), so I don't expect changes on this side (alas). See commit 04cfdf8 for a demonstration.

The text was updated successfully, but these errors were encountered:

cpitclaudel · 2021-08-23T15:25:30Z

Nice sleuthing!

Indeed, this won't speed up actual circuits, because constructing a structure in Verilog doesn't cost anything (it just means remembering that a bunch of wires are bundled together).

On the simulation side it's a nice speedup, but we don't really want to "cheat" by rewriting code to make the simulator happier (especially because these sort of code changes then make it harder to say anything sinceere about Cuttlesim vs. other simulators, since we would have modified the code to fit Cuttlesim better). Instead, we should improve the simulator so that it handles this pattern better (It's a common pattern in the bluespec world).

I thought I did some experiments with this in the past, but I can't find my notes about them. What happens to performance if you force inlining of that function call at the C++ level (IIRC __attribute__((always_inline)) should do it)? Ideally inlining + dead code elimination should get rid of the unnecessary computation.

If that doesn't work, or if it's unreliable, then this would be a really interesting optimization pass to implement in Cuttlesim! We would check whether a function is pure and returns a struct, and if it does we would inline just the relevant computations. Sounds fun, actually :)

mbty · 2021-08-24T13:52:08Z

I just took a glance at how the C++ code for getFields is generated. As it is not a rule but a function, it returns by modifying its ret argument (initially passed by reference). This argument refers to a temporary variable created by the CALL_FN macro in the surrounding statement expression. Just a quick heads up, I guess that naming something tmp in a Kôika function (which should not be too rare) could lead to frustration since it would be hidden by this variable when used as an argument. This is especially true in the case where an element of type T called tmp is passed to a function returning T, since then the compiler wouldn't even complain about types not matching.

@mbty

Part of GH-16. Reported-by: @mbty

cpitclaudel · 2021-08-24T15:13:22Z

Great catch, thanks! Fortunately, there's at least an inkling of the issue thanks to the compiler printing a warning (cuttlesim.hpp:253:47: warning: ‘tmp’ may be used uninitialized in this function [-Wmaybe-uninitialized]).

I pushed a fix just now.

mbty · 2021-09-30T08:39:23Z

An occurrence of tmp was left unchanged in the READ macro, which leads to errors when compiling anything that uses it (e.g. examples/conflicts_modular.v: cuttlesim.hpp:1403:27: error: ‘tmp’ was not declared in this scope; did you mean ‘tm’?).

@mbty

#16 Reported-by: @mbty

cpitclaudel · 2021-09-30T14:32:00Z

Ugh, thanks a lot!

cpitclaudel added a commit that referenced this issue Aug 24, 2021

backends/cpp: Don't hide local variables named "tmp" in calls to intfuns

4bd7ed2

Part of GH-16. Reported-by: @mbty

cpitclaudel added a commit that referenced this issue Sep 30, 2021

backends/cpp: Fix up a typo in the last commit

5b4982e

#16 Reported-by: @mbty

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential optimization in the RISC-V example #16

Potential optimization in the RISC-V example #16

mbty commented Aug 23, 2021

cpitclaudel commented Aug 23, 2021

mbty commented Aug 24, 2021 •

edited

Loading

cpitclaudel commented Aug 24, 2021

mbty commented Sep 30, 2021

cpitclaudel commented Sep 30, 2021

Potential optimization in the RISC-V example #16

Potential optimization in the RISC-V example #16

Comments

mbty commented Aug 23, 2021

cpitclaudel commented Aug 23, 2021

mbty commented Aug 24, 2021 • edited Loading

cpitclaudel commented Aug 24, 2021

mbty commented Sep 30, 2021

cpitclaudel commented Sep 30, 2021

mbty commented Aug 24, 2021 •

edited

Loading