[XIF] Coprocessor XIF Issue response not sampled by ID-EX pipeline registers #941
Labels
Component:RTL
For issues in the RTL (e.g. for files in the rtl directory)
Type:Bug
For bugs in any content (RTL, Documentation, etc.)
Coprocessor XIF Issue response not sampled by ID-EX pipeline registers
The
cv32e40x
CPU attempts to offload an instruction to a coprocessor connected through the CORE-V eXtension Interface (XIFissue_req
) whenever its instruction decoder fails to recognize it (or when the instruction is a CSR instruction).Currently, instruction offloading in the decode stage (ID) does not check for backpressure from the CPU execution stage (EX) when sending the request (
xif_issue.issue_valid = 1'b1
), possibly resulting in the feedback information from the coprocessor response (e.g., the intention to write back the offloaded instruction result to one of the CPU scalar GPRsx0
-x31
) not being sampled by the ID-EX pipeline register and, therefore, not being propagated to the following stages.This isssue is potentially already known, as it seems to be related to the comment at
cv32e40x_id_stage.sv:752
, but the example provided below may help in debugging and solving it.The very simple workaround we implemented (described below) is suboptimal, as it delays XIF instruction offloading until EX is ready to accept a new instruction (0 or 1 cycles in most cases). Therefore, no pull request was opened, although I am available to provide further support if needed.
Component
RTL:
rtl/cv32e40x_id_stage.sv
Steps to Reproduce
The example below was simulated on revision
f17028f
(v0.9.0
).Assembly code
The CPU execution stage applies backpressure to the instruction decoding stage (i.e.,
ex_ready_i = 0
) whenever the instruction execution takes more than one cycle, which is the case, for example, for memory accesses (load
/store
instructions). We triggered this situation with the following code snippet:Where
vsetvl
(from the RISC-V "V" vector extension) is an instruction to be offloaded to a vector coprocessor using the CORE-V eXtension Interface (XIF). This is a setup instruction that attemps to write the value in the source GPR (t0
here) to a CSR in the coprocessor and write the legal value actually accepted by the coprocessor to therd
GPR (t0
again here).Issue summary
In our case, the coprocessors instruction decoding stage can answer an offloading request in the same cycle it issued by the CPU, and the response data is retired the cycle after. In the example proposed here, the CPU execution stage is not ready to accept a new instruction when the
vsetvl
instruction is offloaded, becasue it is still processing the previouslw
. Therefore, the coprocessor response data does not get sampled in theID-EX
pipeline register until the next cycle, when it is no longer valid.The following timing diagram shows a simplified representation of what is happening with the current RTL:
While the following timing diagram shows the expected behaviour, obtained by delaying the offload request until EX is ready (further details below):
Waveforms and details
The waveforms dumped during simulation and the GTKWave view configuration files can be found in the attached archive:
cv32e40x-issue.zip
The XIF issue transaction from the example above begins at
33144ns
in the attached VCD fileplain.vcd
, that can be opened with:As shown, the
issue_valid
signal from the CPU and theissue_ready
andaccept
signals from the coprocessor are active in the same cycle (markerA
), indicating that the offloading transaction succeded. The coprocessor also communicate that the instruction can trigger an exception (xif_exception
) and that it will write back the instruction result in one of the CPU GPRs (t0
in particular). In this cycle, the CPU EX stage is still busy executing the previouslw
instruction (ex_ready_i = 0
in ID stage).In the next cycle (marker
B
), the CPU EX stage becomes ready agains, so the information about the offloaded instruction that comes from the CPU IF-ID pipeline register gets sampled correctly in the ID-EX pipeline register, and the instruction is actually propagated through the CPU stages. However, the information in the coprocessor issue response, that was valid in the previous cycle when the offloading transaction took place, is missed. One of the consequences is that the result data from the coprocessor (result transaction at markerC
) does not get written back to the CPU GPRs (x5
doesn't change to0x100
at markerD
). More sever consequences may appear when the coprocessor uses the CPU load-store unit or the bus, possibly triggering errors or exceptions that would be ignored by the CPU.Workaround
A quick and dirty solution to get the information from the coprocessor correctly sampled into the ID-EX stage is to attempt instruction offloading through the XIF only when the execution stage is ready to accept a new instruction (i.e.,
ex_ready_i = 1
in ID stage). We implemented this by making theissue_valid
XIF signal depend on theex_ready_i
signal from the execution stage, as shown in the following patch:This produced the expected result included in the
patched.vcd
dump, that can be opened with:Here, the XIF offload transaction is delayed by one cycle and takes place at
33146ns
(markerA
), when the EX stage is ready. This way, the response data from the coprocessor get sampled correctly in the ID-EX pipeline register (markerB
). As a consequence, the result provided by the coprocessor (at markerC
) gets correctly written back to the target GPR (x5
value changes from0x101
to0x100
at markerD
).Please let me know if further details are needed.
Thank you for your attention and collaboration,
Michele
The text was updated successfully, but these errors were encountered: