Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XIF] Coprocessor XIF Issue response not sampled by ID-EX pipeline registers #941

Open
StMiky opened this issue Sep 7, 2023 · 3 comments
Assignees
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) Type:Bug For bugs in any content (RTL, Documentation, etc.)

Comments

@StMiky
Copy link

StMiky commented Sep 7, 2023

Coprocessor XIF Issue response not sampled by ID-EX pipeline registers

The cv32e40x CPU attempts to offload an instruction to a coprocessor connected through the CORE-V eXtension Interface (XIF issue_req) whenever its instruction decoder fails to recognize it (or when the instruction is a CSR instruction).

Currently, instruction offloading in the decode stage (ID) does not check for backpressure from the CPU execution stage (EX) when sending the request (xif_issue.issue_valid = 1'b1), possibly resulting in the feedback information from the coprocessor response (e.g., the intention to write back the offloaded instruction result to one of the CPU scalar GPRs x0-x31) not being sampled by the ID-EX pipeline register and, therefore, not being propagated to the following stages.

This isssue is potentially already known, as it seems to be related to the comment at cv32e40x_id_stage.sv:752, but the example provided below may help in debugging and solving it.

The very simple workaround we implemented (described below) is suboptimal, as it delays XIF instruction offloading until EX is ready to accept a new instruction (0 or 1 cycles in most cases). Therefore, no pull request was opened, although I am available to provide further support if needed.

Component

RTL: rtl/cv32e40x_id_stage.sv

Steps to Reproduce

The example below was simulated on revision f17028f (v0.9.0).

Assembly code

The CPU execution stage applies backpressure to the instruction decoding stage (i.e., ex_ready_i = 0) whenever the instruction execution takes more than one cycle, which is the case, for example, for memory accesses (load/store instructions). We triggered this situation with the following code snippet:

    ...
    lw t0, CTL_REG_VL_REG_ADDR(zero)
    lw t1, CTL_REG_VTYPE_REG_ADDR(zero)
    sw zero, CTL_REG_OP_CTL_REG_ADDR(zero)
    vsetvl t0, t0, t1 # instruction offloaded to the coprocessor through XIF
    sw t0, CTL_REG_VL_REG_ADDR(zero) # write offloaded instr. result to `t0`
    lw a0, CTL_REG_ARG0_REG_ADDR(zero) # load arg0 from control registers
    ...

Where vsetvl (from the RISC-V "V" vector extension) is an instruction to be offloaded to a vector coprocessor using the CORE-V eXtension Interface (XIF). This is a setup instruction that attemps to write the value in the source GPR (t0 here) to a CSR in the coprocessor and write the legal value actually accepted by the coprocessor to the rd GPR (t0 again here).

Issue summary

In our case, the coprocessors instruction decoding stage can answer an offloading request in the same cycle it issued by the CPU, and the response data is retired the cycle after. In the example proposed here, the CPU execution stage is not ready to accept a new instruction when the vsetvl instruction is offloaded, becasue it is still processing the previous lw. Therefore, the coprocessor response data does not get sampled in the ID-EX pipeline register until the next cycle, when it is no longer valid.

The following timing diagram shows a simplified representation of what is happening with the current RTL:

plain

While the following timing diagram shows the expected behaviour, obtained by delaying the offload request until EX is ready (further details below):

expected

Waveforms and details

The waveforms dumped during simulation and the GTKWave view configuration files can be found in the attached archive:

cv32e40x-issue.zip

The XIF issue transaction from the example above begins at 33144ns in the attached VCD file plain.vcd, that can be opened with:

gtkwave -a plain.gtkw plain.vcd

As shown, the issue_valid signal from the CPU and the issue_ready and accept signals from the coprocessor are active in the same cycle (marker A), indicating that the offloading transaction succeded. The coprocessor also communicate that the instruction can trigger an exception (xif_exception) and that it will write back the instruction result in one of the CPU GPRs (t0 in particular). In this cycle, the CPU EX stage is still busy executing the previous lw instruction (ex_ready_i = 0 in ID stage).

In the next cycle (marker B), the CPU EX stage becomes ready agains, so the information about the offloaded instruction that comes from the CPU IF-ID pipeline register gets sampled correctly in the ID-EX pipeline register, and the instruction is actually propagated through the CPU stages. However, the information in the coprocessor issue response, that was valid in the previous cycle when the offloading transaction took place, is missed. One of the consequences is that the result data from the coprocessor (result transaction at marker C) does not get written back to the CPU GPRs (x5 doesn't change to 0x100 at marker D). More sever consequences may appear when the coprocessor uses the CPU load-store unit or the bus, possibly triggering errors or exceptions that would be ignored by the CPU.

Workaround

A quick and dirty solution to get the information from the coprocessor correctly sampled into the ID-EX stage is to attempt instruction offloading through the XIF only when the execution stage is ready to accept a new instruction (i.e., ex_ready_i = 1 in ID stage). We implemented this by making the issue_valid XIF signal depend on the ex_ready_i signal from the execution stage, as shown in the following patch:

diff --git a/rtl/cv32e40x_id_stage.sv b/rtl/cv32e40x_id_stage.sv
index 1385dfb..d760e55 100644
--- a/rtl/cv32e40x_id_stage.sv
+++ b/rtl/cv32e40x_id_stage.sv
@@ -749,7 +749,9 @@ module cv32e40x_id_stage import cv32e40x_pkg::*;
       // Also attempt to offload any CSR instruction. The validity of such instructions are only
       // checked in the EX stage.
       // Instructions with deassert_we set to 1 from the controller bypass logic will not be attempted offloaded.
-      assign xif_issue_if.issue_valid     = instr_valid && (illegal_insn || csr_en) &&
+      // Only offload instructions if the EX stage is ready not to miss data from xif_issue.issue_resp
+      assign xif_issue_if.issue_valid     = instr_valid && ex_ready_i &&
+                                            (illegal_insn || csr_en) &&
                                             !(xif_accepted_q || xif_rejected_q || ctrl_byp_i.deassert_we);
 
       // Keep xif_offloading_o high after an offloaded instruction was accepted or rejected to get

This produced the expected result included in the patched.vcd dump, that can be opened with:

gtkwave -a patched.gtkw patched.vcd

Here, the XIF offload transaction is delayed by one cycle and takes place at 33146ns (marker A), when the EX stage is ready. This way, the response data from the coprocessor get sampled correctly in the ID-EX pipeline register (marker B). As a consequence, the result provided by the coprocessor (at marker C) gets correctly written back to the target GPR (x5 value changes from 0x101 to 0x100 at marker D).

⚠️ NOTE : this is not an optimal solution, as it effectively stalls the coprocessor for one cycle (possibly more if the previous instruction has a longer execution latency). A better way to fix this issue may be to offload an instruction to the coprocessor as soon as it is fetched and decoded (as it is now without the patch), and sample the coprocessor response data in a dedicated register (different from the ID-EX pipeline register) while waiting for the execution stage to be ready. Still, I'm not sure whether there are implications of such solution on the synchronization between the coprocessor and the CPU, so I chose the safer, unoptimized way.

Please let me know if further details are needed.

Thank you for your attention and collaboration,
Michele

@Silabs-ArjanB Silabs-ArjanB added Component:RTL For issues in the RTL (e.g. for files in the rtl directory) Type:Bug For bugs in any content (RTL, Documentation, etc.) labels Sep 7, 2023
@Silabs-ArjanB
Copy link
Contributor

Hi @StMiky Thank you for your excellent issue report! We are indeed aware that there are back pressure related issue involving the XIF interface (but I would need to dive into this one deeper to see if this is a new issue or one we already know). On our side the XIF related work is unfortunately postponed in favor of other topics, so I cannot promise yet when we will get around to fixing any XIF related issue. Best regards, Arjan

@Silabs-ArjanB Silabs-ArjanB changed the title Coprocessor XIF Issue response not sampled by ID-EX pipeline registers [XIF] Coprocessor XIF Issue response not sampled by ID-EX pipeline registers Sep 7, 2023
@DBees
Copy link

DBees commented Sep 8, 2023

@christian-herber-nxp FYI

@davideschiavone
Copy link
Contributor

hi @Silabs-ArjanB - @StMiky also pushed a tentative fix in our fork here esl-epfl@5b8a08d

If this fix works for you, we can open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) Type:Bug For bugs in any content (RTL, Documentation, etc.)
Projects
None yet
Development

No branches or pull requests

4 participants