Implement micro op fusion in decode stage. #113

zxc12523 · 2023-11-07T18:15:09Z

In the decode stage, we might find several pairs of uops that can be merged into one instruction to increase performance. Since this optimization is common in modern high-performance CPUs, we can add this feature for users to model the performance gain.

klingaard · 2023-11-07T19:29:20Z

Oh, absolutely!

The challenge here is -- can you build a small fusion framework in Olympia that allows a user of the model to experiment with configurable combinations? In other words -- do not hard-code the pairings in the simulator, set up a framework that is runtime programmable via YAML or JSON to identify pairings. That'd be really cool and very powerful.

danbone · 2023-11-08T12:02:11Z

@klingaard Is there any support for this in mavis? I saw a morph instruction function.

zxc12523 · 2023-11-09T06:31:27Z

@klingaard maybe we can add those configure into small_core.yaml ?

klingaard · 2023-11-09T15:50:58Z

Is there any support for this in mavis? I saw a morph instruction function.

Yes, and you're correct, it's related to the morph function call. I'm not a Mavis expert (@dbmurrell is the original author), but if you look at https://github.com/sparcians/mavis/blob/4f3fef891f9ddc5c371c27500d02596f21ea6fc8/test/main.cpp#L446 you can see an example of how you can morph an existing instruction into a fused one. I think the process is:

Identify a pairing (within a decode group or across [that's tricky])
Morph the first instruction into the fused "new" operation
No-op the second (force it to go directly to the ROB)

maybe we can add those configure into small_core.yaml

I think that's reasonable, but you might run into limitations with YAML to properly identify pairings. Dunno until there's a design in place for how you want to do it. Suggestion: Might want to specify a different language (an XML derivative with a DOM) and reference that:

top.cpu.core0.extension.core_extensions:
    decode_fusions: "fusion_pairs.xml"

My suggestion for this entire effort: move this to a discussion and create a design document. Start with a use case, specifically, which pairs will you initially be fusing? For those pairs, what are the constraints?

For example, the first instruction must be an add followed by a branch AND the add's RD field must be the same as the branch's RS2 field... etc.

From there, you can help you determine the "language" you want to build to specify the pairings -- and how a generic fuser will convert that into runtime code...

klingaard · 2024-02-08T20:48:25Z

So @jeffnye-gh has been looking at this. Discussion: #121
as well as first PR: #135

ghost changed the title ~~Implement micro op fussion in decode stage.~~ Implement micro op fusion in decode stage. Nov 27, 2023

klingaard linked a pull request Feb 8, 2024 that will close this issue

fusion PR, removed DSL #146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement micro op fusion in decode stage. #113

Implement micro op fusion in decode stage. #113

zxc12523 commented Nov 7, 2023

klingaard commented Nov 7, 2023

danbone commented Nov 8, 2023

zxc12523 commented Nov 9, 2023

klingaard commented Nov 9, 2023

klingaard commented Feb 8, 2024

Implement micro op fusion in decode stage. #113

Implement micro op fusion in decode stage. #113

Comments

zxc12523 commented Nov 7, 2023

klingaard commented Nov 7, 2023

danbone commented Nov 8, 2023

zxc12523 commented Nov 9, 2023

klingaard commented Nov 9, 2023

klingaard commented Feb 8, 2024