-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define hardware error exception cause code #1105
Conversation
dc6d44e
to
88fbee7
Compare
A Proposal for Non-Normative Description of Hardware Error: A hardware error is a synchronous exception triggered when corrupted or uncorrectable data is accessed. This can occur during the fetch of an instruction, through a load instruction (whether it's an explicit or implicit access), a store/AMO instruction (again, either through explicit or implicit access), or by other instructions that attempt to access corrupted data for their operation. In this context, "data" encompasses all types of information used within a RISC-V hart. The -- |
I'll let you, @gfavor, and @jhauser-us sort out which cause code is used for which exception, then I'll update the PR accordingly. I'm just the messenger. |
As far as cause code assignment, I believe there is no great reason for which exception gets which cause code. So the naturual thing to do is what Andrew is doing, i.e. Hardware Error is now being assigned, so it gets 18, and then when Software Integrity Error gets assigned, it can take the next sequential code 19. Btw, Andrew, why the conscious skip over reserved cause codes 16 and 17? |
Here's a reduced version of Ved's proposal for the Hardware Error exception description. This drops the second sentence that tries to describe all the kinds of instructions that may do explicit and implicit access, but doesn't truly add anything on top of the existing arch specs, especially since it isn't a complete explicit list of such types of instructions. (To be fair, Ved copied this from an earlier stab I had suggested a month ago.) This also drops the last sentence about uncertain priority since I then propose additions to Table 3.7 that establish clear priorities for Hardware Errors. A Hardware Error is a synchronous exception triggered when corrupted or uncorrectable data is accessed explicitly or implicitly by an instruction. In this context, "data" encompasses all types of information used within a RISC-V hart. The *epc CSR on delivery of the hardware error exception holds the address of the instruction that attempted to access corrupted data, while the *tval CSR is either set to 0 or holds the virtual address of an instruction fetch, load, or store that attempted to access corrupted data. I would argue that there is a clear assignment of Hardware Error priority among cause codes that is consistent with existing cause code prioritization, e.g. instruction fetch-related exceptions take priority over decode/execution-related exceptions. Hence, in Table 3.7 in the Priv spec:
|
The description sounds good to me. About the priority, I agree that there is a priority implied by fetch, decode, and execution. But with the expanded scope of hardware errors, we could expect a hardware error during the phase that detects instruction address breakpoint (e.g., parity error on trigger registers) or as part of execution of a integer/FP/vector instruction (e.g., a register file parity error detected by an SLTI instruction). That led to my observation that it could be detected in any phase. Adding "first encountered ..." to other boxes could also address this expanded scope. |
Makes sense. I think this results in just these two further additions:
|
Per discussions in yesterday's Architecture Review Committee call, please move the hardware error exception to cause code 19, and change code 18 to be a software check exception. (Speaking only for myself, I hope a better name can be found than software check, but this is what the ARC settled on during the call.) |
Fixed. Need some text for the software check exception, too. |
88fbee7
to
47fc021
Compare
Andrew, what are you thinking as far as the exception priority issue (and old Table 3.7, now Table 15). Re Ved's last priority comment, I'll agree. In which case it seems overly wordy and messy to suitably incorporate Hardware Error into each and every category in the table. How about instead adding to the preceding paragraph that refers to the table (addition in bold): If an instruction may raise multiple synchronous exceptions, the decreasing priority order of Table 15 indicates which exception is taken and reported in mcause. The priority of Hardware Error exception is implementation-defined, but any given occurrence is generally expected to be recognized at the point in the overall priority order at which the hardware error is discovered. The priority of any custom synchronous exceptions is implementation-defined. |
I'm with y'all on the problem statement, but I'm unsure as to the best solution. My hot take is that it would simplify the broader exception-priority story to identify the phases of instruction execution and prioritize exceptions by execution phase. Then we'd define a priority of exceptions within a phase. This might seem like overkill, but in the long run, I think it will simplify the description when the same cause code can show up at multiple points in the pipeline. (And it wouldn't be a dense matrix of phases and causes, as most causes can't occur in most phases.) Perhaps I should try my hand at seeing what this would look like. If it doesn't pan out, so be it; I imagine we all find Greg's suggestion to be unobjectionable. |
I feel Software Protection Check will more descriptive of the exception. A Proposal for Non-Normative Description of Software Protection Check Error: The Software Protection Check is a synchronous exception that is triggered when there are violations of checks and assertions defined by ISA extensions that aim to safeguard the integrity of software assets, encompassing control flow, memory among others. When the Software Protection Check exception is raised, the |
There was substantial ARC discussion about naming this exception that covered a range of proposed names, resulting in a non-unanimous concensus (reflecting how arguable a number of reasonable possible names are). Personally I would be concerned about the adjective "Protection" being a bit narrowing to what this exception can be used for over time (as more extensions make use of it), and also has connotations that to some may be different that what we actually have in mind. My own leaning was "Software Error Check", but that had its own objections from some others. |
How about this tweaked and trimmed down version of Ved's description: The Software Check exception is a synchronous exception that is triggered when there are violations of checks and assertions defined by ISA extensions that aim to safeguard the integrity of software assets, encompassing control flow and memory access constraints among others. When this exception is raised, the *tval register captures details about the failing check. The specifics of this information are outlined in the ISA extensions that can trigger this exception. If no information is provided, then a value of 0 is captured. The priority of this exception, in relation to other synchronous exceptions, depends the particular cause of this exception and is explicitly specified by the corresponding ISA extensions. |
looks good to me! |
This seems to match the general spirit o machine check, which might
otherwise be called Hardware Check Exception if machine check weren't
already accepted by the industry.
…On Fri, Sep 8, 2023 at 5:43 PM Ved Shanbhogue ***@***.***> wrote:
looks good to me!
—
Reply to this email directly, view it on GitHub
<#1105 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHPXVJT6ASEGF4SSQTN4A5DXZO3SRANCNFSM6AAAAAA4GZQQNU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I'm also fine with this, but I'll let it marinate for a few days before updating the PR. |
I guess its marinated enough to bake now! |
47fc021
to
c0f33bf
Compare
Indeed, thanks for reminding me, @ved-rivos! |
@aswaterman - I think this description for software check exception did not make it into the PR |
Fixed (with minor tweaks). |
This PR is incomplete, as @ved-rivos and @gfavor are supposed to provide some non-normative text that describes the intended function of this exception.
cc @jhauser-us