Skip to content

chore: get instruction appendix ready for release #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ RUN \
libyaml-dev \
nodejs \
npm \
parallel \
python3 \
python3-pip \
python3.12-venv \
Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/regress.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,10 @@ jobs:
uses: actions/checkout@v4
- name: singularity setup
uses: ./.github/actions/singularity-setup
- name: Generate instruction appendix
run: ./do gen:instruction_appendix
- name: Generate instruction appendix asciidoc
run: ./do gen:instruction_appendix_adoc
- name: Check instruction appendix result
run: ./do test:instruction_appendix
regress-cfg-manual:
runs-on: ubuntu-latest
env:
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ repos:
stages: [pre-commit]
- id: end-of-file-fixer
stages: [pre-commit]
exclude: \.golden.adoc$
- id: trailing-whitespace
stages: [pre-commit]
args: [--markdown-linebreak-ext=md]
Expand Down
53 changes: 25 additions & 28 deletions arch/inst/Zawrs/wrs.nto.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,37 @@ kind: instruction
name: wrs.nto
long_name: Wait-on-Reservation-Set-with-No-Timeout
description: |
-id: inst-wrs.nto-behaviour
-normative: false
-text: |
To mitigate the wasteful looping in such usages, a `wrs.nto` (WRS-with-no-timeout) instruction is provided.
Instead of polling for a store to a specific memory location, software registers a reservation set that
includes all the bytes of the memory location using the LR instruction. Then a subsequent `wrs.nto`
instruction would cause the hart to temporarily stall execution in a low-power state until a store
occurs to the reservation set or an interrupt is observed.
To mitigate the wasteful looping in such usages, a `wrs.nto` (WRS-with-no-timeout) instruction is provided.
Instead of polling for a store to a specific memory location, software registers a reservation set that
includes all the bytes of the memory location using the LR instruction. Then a subsequent `wrs.nto`
instruction would cause the hart to temporarily stall execution in a low-power state until a store
occurs to the reservation set or an interrupt is observed.

This instruction is not supported in a constrained LR/SC loop.
While stalled, an implementation is permitted to occasionally terminate the stall and complete
execution for any reason.
This instruction is not supported in a constrained LR/SC loop.
While stalled, an implementation is permitted to occasionally terminate the stall and complete
execution for any reason.

`wrs.nto` follows the rules of the WFI instruction for resuming execution
on a pending interrupt.
`wrs.nto` follows the rules of the WFI instruction for resuming execution
on a pending interrupt.

When the TW (Timeout Wait) bit in `mstatus` is set and `wrs.nto` is executed
in any privilege mode otherthan M mode, and it does not complete within an implementation-specific
bounded time limit, the `wrs.nto` instruction will cause an illegal instruction exception.
When the TW (Timeout Wait) bit in `mstatus` is set and `wrs.nto` is executed
in any privilege mode otherthan M mode, and it does not complete within an implementation-specific
bounded time limit, the `wrs.nto` instruction will cause an illegal instruction exception.

When executing in VS or VU mode, if the VTW bit is set in `hstatus`, the TW bit in `mstatus` is clear,
and the `wrs.nto` does not complete within an implementation-specific bounded time limit,
the `wrs.nto` instruction will cause a virtual instruction exception.
When executing in VS or VU mode, if the VTW bit is set in `hstatus`, the TW bit in `mstatus` is clear,
and the `wrs.nto` does not complete within an implementation-specific bounded time limit,
the `wrs.nto` instruction will cause a virtual instruction exception.

[Note]
Since `wrs.nto` can complete execution for reasons other than stores to the reservation set,
software will likely need a means of looping until the required stores have occurred.
[Note]
Since `wrs.nto` can complete execution for reasons other than stores to the reservation set,
software will likely need a means of looping until the required stores have occurred.

[Note]
`wrs.nto`, unlike WFI, is not specified to cause an illegal instruction exception if executed in U-mode
when the governing TW bit is 0. WFI is typically not expected to be used in U-mode and on many systems
may promptly cause an illegal instruction exception if used at U-mode.
Unlike WFI, `wrs.nto` is expected to be used by software in U-mode when waiting on memory but without
a deadline for that wait.
[Note]
`wrs.nto`, unlike WFI, is not specified to cause an illegal instruction exception if executed in U-mode
when the governing TW bit is 0. WFI is typically not expected to be used in U-mode and on many systems
may promptly cause an illegal instruction exception if used at U-mode.
Unlike WFI, `wrs.nto` is expected to be used by software in U-mode when waiting on memory but without
a deadline for that wait.
definedBy: Zawrs
assembly: ""
encoding:
Expand Down
53 changes: 25 additions & 28 deletions arch/inst/Zawrs/wrs.sto.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,33 @@ kind: instruction
name: wrs.sto
long_name: Wait-on-Reservation-Set-with-Short-Timeout
description: |
-id: inst-wrs.sto-behaviour
-normative: false
-text: |
Instead of polling for a store to a specific memory location, software registers a
reservation set that includes all the bytes of the memory location using the LR instruction.
A subsequent `wrs.sto` instruction would cause the hart to temporarily stall execution in a
low-power state until a store occurs to the reservation set or an interrupt is observed.
Sometimes the program waiting on a memory update may also need to carry out a task at a future time
or otherwise place an upper bound on the wait. To support such use cases, `wrs.sto` bounds the
stall duration to an implementation-define short timeout such that the stall is terminated on the
timeout if no other conditions have occurred to terminate the stall. The program using this instruction
may then determine if its deadline has been reached.
`wrs.sto` causes the hart to temporarily stall execution in a low-power state as long as the reservation
set is valid and no pending interrupts, even if disabled, are observed. For `wrs.sto` the stall duration
is bounded by an implementation defined short timeout. These instructions are not supported in a
constrained LR/SC loop.
Hart execution may be stalled while the following conditions are all satisfied:
a. The reservation set is valid
b. If `wrs.sto`, a "short" duration since start of stall has not elapsed
c. No pending interrupt is observed (see the rules below)
Instead of polling for a store to a specific memory location, software registers a
reservation set that includes all the bytes of the memory location using the LR instruction.
A subsequent `wrs.sto` instruction would cause the hart to temporarily stall execution in a
low-power state until a store occurs to the reservation set or an interrupt is observed.
Sometimes the program waiting on a memory update may also need to carry out a task at a future time
or otherwise place an upper bound on the wait. To support such use cases, `wrs.sto` bounds the
stall duration to an implementation-define short timeout such that the stall is terminated on the
timeout if no other conditions have occurred to terminate the stall. The program using this instruction
may then determine if its deadline has been reached.
`wrs.sto` causes the hart to temporarily stall execution in a low-power state as long as the reservation
set is valid and no pending interrupts, even if disabled, are observed. For `wrs.sto` the stall duration
is bounded by an implementation defined short timeout. These instructions are not supported in a
constrained LR/SC loop.
Hart execution may be stalled while the following conditions are all satisfied:
a. The reservation set is valid
b. If `wrs.sto`, a "short" duration since start of stall has not elapsed
c. No pending interrupt is observed (see the rules below)

While stalled, an implementation is permitted to occasionally terminate the stall and complete
execution for any reason. `wrs.sto` follows the rules of the WFI instruction for resuming execution
on a pending interrupt. Since `wrs.sto` can complete execution for reasons other than stores to
the reservation set, software will likely need a means of looping until the required stores have occurred.
While stalled, an implementation is permitted to occasionally terminate the stall and complete
execution for any reason. `wrs.sto` follows the rules of the WFI instruction for resuming execution
on a pending interrupt. Since `wrs.sto` can complete execution for reasons other than stores to
the reservation set, software will likely need a means of looping until the required stores have occurred.

[Note]
The duration of a `wrs.sto` instruction's timeout may vary significantly within and among implementations.
In typical implementations this duration should be roughly in the range of 10 to 100 times an on-chip
cache miss latency or a cacheless access to main memory.
[Note]
The duration of a `wrs.sto` instruction's timeout may vary significantly within and among implementations.
In typical implementations this duration should be roughly in the range of 10 to 100 times an on-chip
cache miss latency or a cacheless access to main memory.
definedBy: Zawrs
assembly: ""
encoding:
Expand Down
25 changes: 11 additions & 14 deletions arch/inst/Zicsr/csrrc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,18 @@ kind: instruction
name: csrrc
long_name: Atomic Read and Clear Bits in CSR
description: |
-id: inst-csrrc-behavior
-normative: false
-text: |
The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends
the value to XLEN bits, and writes it to integer register `rd`. The initial value in integer register `rs1` is
treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in `rs1` will
cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable.
The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends
the value to XLEN bits, and writes it to integer register `rd`. The initial value in integer register `rs1` is
treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in `rs1` will
cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable.

For CSRRC, if `rs1=x0`, then the instruction will not write to the CSR at all, and so shall
not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal-
instruction exceptions on accesses to read-only CSRs. CSRRC always reads the addressed CSR and
cause any read side effects regardless of `rs1` and `rd` fields.
Note that if `rs1` specifies a register other than `x0`, and that register holds a zero value,
the instruction will not action any attendant per-field side effects, but will action any
side effects caused by writing to the entire CSR.
For CSRRC, if `rs1=x0`, then the instruction will not write to the CSR at all, and so shall
not cause any of the side effects that might otherwise occur on a CSR write, nor raise illegal-
instruction exceptions on accesses to read-only CSRs. CSRRC always reads the addressed CSR and
cause any read side effects regardless of `rs1` and `rd` fields.
Note that if `rs1` specifies a register other than `x0`, and that register holds a zero value,
the instruction will not action any attendant per-field side effects, but will action any
side effects caused by writing to the entire CSR.
definedBy: Zicsr
assembly: xd, csr, xs1
encoding:
Expand Down
15 changes: 6 additions & 9 deletions arch/inst/Zicsr/csrrci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,12 @@ kind: instruction
name: csrrci
long_name: Atomic Read and Clear Bits in CSR with Immediate
description: |
-id: inst-csrrci-behavior
-normative: false
-text: |
The CSRRCI variant is similar to CSRRC, except this updates the CSR using an XLEN-bit value obtained
by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a
value from an integer register. For CSRRCI, if the `uimm[4:0]` field is zero, then this instruction
will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on
a CSR write, nor raise illegal-instruction exceptions on accesses to read-only CSRs. The CSRRCI will
always read the CSR and cause any read side effects regardless of `rd` and `rs1` fields.
The CSRRCI variant is similar to CSRRC, except this updates the CSR using an XLEN-bit value obtained
by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a
value from an integer register. For CSRRCI, if the `uimm[4:0]` field is zero, then this instruction
will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on
a CSR write, nor raise illegal-instruction exceptions on accesses to read-only CSRs. The CSRRCI will
always read the CSR and cause any read side effects regardless of `rd` and `rs1` fields.
definedBy: Zicsr
assembly: xd, csr, uimm
encoding:
Expand Down
15 changes: 6 additions & 9 deletions arch/inst/Zicsr/csrrsi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,12 @@ kind: instruction
name: csrrsi
long_name: Atomic Read and Set Bits in CSR with Immediate
description: |
-id: inst-csrrsi-behavior
-normative: false
-text: |
The CSRRSI variant is similar to CSRRS, except this updates the CSR using an XLEN-bit value obtained
by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a
value from an integer register. For CSRRSI, if the `uimm[4:0]` field is zero, then this instruction
will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on
a CSR write, nor raise illegal-instruction exceptions on accesses to read-only CSRs. The CSRRSI will
always read the CSR and cause any read side effects regardless of `rd` and `rs1` fields.
The CSRRSI variant is similar to CSRRS, except this updates the CSR using an XLEN-bit value obtained
by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a
value from an integer register. For CSRRSI, if the `uimm[4:0]` field is zero, then this instruction
will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on
a CSR write, nor raise illegal-instruction exceptions on accesses to read-only CSRs. The CSRRSI will
always read the CSR and cause any read side effects regardless of `rd` and `rs1` fields.
definedBy: Zicsr
assembly: xd, csr, uimm
encoding:
Expand Down
18 changes: 10 additions & 8 deletions arch/inst/Zkn/aes64ks1i.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,16 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64ks1i
long_name: AES Key Schedule Instruction 1
description: |
-id: inst-aes64ks1i-behavior
-normative: true
-text: |
This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES
block cipher Key Schedule. This instruction must _always_ be implemented such that its execution
latency does not depend on the data being operated on. Note that `rnum` must be in the range
`0x0..0xA`. The values `0xB..0xF` are reserved.
description:
- id: inst-aes64ks1i-behavior
normative: true
text: |
This instruction implements the rotation, SubBytes and Round Constant addition steps of the AES
block cipher Key Schedule.
- id: inst-aes64ks1i-range
normative: true
text: |
`rnum` must be in the range `0x0..0xA`. The values `0xB..0xF` are reserved.
definedBy:
anyOf: [Zknd, Zkne]
base: 64
Expand Down
13 changes: 6 additions & 7 deletions arch/inst/Zkn/aes64ks2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64ks2
long_name: AES Key Schedule Instruction 2
description: |
-id: instr-aes64ks2-behavior
-normative: true
-text: |
This instruction implements the additional XOR'ing of key words as part of the AES block cipher
Key Schedule. This instruction must _always_ be implemented such that its execution latency does
not depend on the data being operated on.
description:
- id: instr-aes64ks2-behavior
normative: true
text: |
This instruction implements the additional XOR'ing of key words as part of the AES block cipher
Key Schedule.
definedBy:
anyOf: [Zknd, Zkne]
base: 64
Expand Down
13 changes: 6 additions & 7 deletions arch/inst/Zknd/aes64ds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64ds
long_name: AES decrypt final round
description: |
-id: inst-aes64ds-behavior
-normative: true
-text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the Inverse ShiftRows and SubBytes steps. This instruction must _always_ be
implemented such that its execution latency does not depend on the data being operated on.
description:
- id: inst-aes64ds-behavior
normative: true
text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the Inverse ShiftRows and SubBytes steps.
definedBy: Zknd
base: 64
assembly: xd, xs1, xs2
Expand Down
13 changes: 6 additions & 7 deletions arch/inst/Zknd/aes64dsm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64dsm
long_name: AES decrypt middle round
description: |
-id: inst-aes64dsm-behavior
-normative: true
-text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the Inverse ShiftRows, SubBytes and MixColumns steps. This instruction must _always_
be implemented such that its execution latency does not depend on the data being operated on.
description:
- id: inst-aes64dsm-behavior
normative: true
text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the Inverse ShiftRows, SubBytes and MixColumns steps.
definedBy: Zknd
base: 64
assembly: xd, xs1, xs2
Expand Down
16 changes: 7 additions & 9 deletions arch/inst/Zknd/aes64im.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64im
long_name: AES Decrypt KeySchedule MixColumns
description: |
-id: inst-aes64im-behavior
-normative: true
-text: |
The instruction applies the inverse MixColumns transformation to two columns of the state array,
packed into a single 64-bit register. It is used to create the inverse cipher KeySchedule, according to
the equivalent inverse cipher construction in (NIST, 2001) (Page 23, Section 5.3.5). This
instruction must always be implemented such that its execution latency does not depend on the
data being operated on.
description:
- id: inst-aes64im-behavior
normative: true
text: |
The instruction applies the inverse MixColumns transformation to two columns of the state array,
packed into a single 64-bit register. It is used to create the inverse cipher KeySchedule, according to
the equivalent inverse cipher construction in (NIST, 2001) (Page 23, Section 5.3.5).
definedBy: Zknd
base: 64
assembly: xd, xs1
Expand Down
13 changes: 6 additions & 7 deletions arch/inst/Zkne/aes64es.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ $schema: "inst_schema.json#"
kind: instruction
name: aes64es
long_name: AES encrypt final round
description: |
-id: inst-aes64es-behavior
-normative: true
-text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the ShiftRows and SubBytes steps. This instruction must _always_ be
implemented such that its execution latency does not depend on the data being operated on.
description:
- id: inst-aes64es-behavior
normative: true
text: |
Uses the two 64-bit source registers to represent the entire AES state, and produces _half_ of the next
round output, applying the ShiftRows and SubBytes steps.
definedBy: Zkne
base: 64
assembly: xd, xs1, xs2
Expand Down
Loading
Loading