diff --git a/intro.adoc b/intro.adoc index 6dee094..24b39bb 100644 --- a/intro.adoc +++ b/intro.adoc @@ -1,6 +1,6 @@ [[Introduction]] == Introduction -The Zawrs extension defines a single instruction to be used in polling loops +The Zawrs extension defines a pair of instructions to be used in polling loops that allows a core to enter a low-power state and wait on either a write to a memory location or other asynchronous system events. It addresses common use-cases in operating systems when waiting for contended locks or for @@ -24,13 +24,20 @@ such as: Such usages involve polling on memory locations, and such busy loops can be a wasteful expenditure of energy. To mitigate the wasteful looping in such usages, -a `WRS` instruction is proposed. Instead of polling for a write at a specific -memory location, software would add that memory location to the reservation set -using the existing `LR` instruction - a subsequent `WRS` instruction would -cause the hart to stall until a write occurs to the reservation set. On RV64 -systems, this proposal allows software to specify (in `rs1` operand) a deadline -as a time value in the future when execution would resume in absence of a store -to the reservation set. +a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling +for a write at a specific memory location, software would add that memory +location to the reservation set using the existing `LR` instruction - a +subsequent `WRS.NTO` instruction would cause the hart to stall until a write +occurs to the reservation set. + +Sometimes the program waiting on a memory update may also need to carry out a +task at a future time (e.g., generating a heartbeat, etc.) or otherwise place +an upper bound on the wait. To support such usages a second instruction +`WRS.STO` (WRS-with-a-short-timeout) is provided that works like `WRS.NTO` but +bounds the stall duration to a short timeout such that the stall is removed +on the timeout if no other conditions to remove the stall have occurred. The +program using this instruction may then determine if its next deadline has +has been reached. Such mechanisms have been demonstrated to help achieve increased efficiency both in terms of instruction count as well as cycle count and thereby lead to @@ -39,13 +46,3 @@ cite:[osti_1376625], cite:[dpdk_2019] and cite:[usenix_196292]. In such applications, optionally bounding the wait for the event to occur with a timeout is commonly used cite:[sigplan_2001], cite:[Franke2005FussF], cite:[pthread_man], and cite:[futex_man]. - -=== Suitability for Fast Track Extension Process -This proposed extension meets the Fast Track criteria: it consists of a single -instruction, it addresses the need to be able to efficiently wait on memory -updates which is common across the whole gamut of use cases, it fits in well -with other instructions, and is not expected to be contentious. This extension -helps RISC-V catch up with established capabilities in other architectures -(UMONITOR/UMWAIT - x86-Intel; WFET - ARM, MWAITX- x86-AMD) to address existing -software use cases and is not inventing new use cases. - diff --git a/zawrs.adoc b/zawrs.adoc index 9e6b8a5..1d55ee9 100644 --- a/zawrs.adoc +++ b/zawrs.adoc @@ -1,18 +1,14 @@ [[Zawrs]] == Zawrs -The `WRS` instruction is available in all privilege modes and is uses the -`SYSTEM` major opcode. On RV64 systems, if `rs1` is not `x0` then `rs1` holds -a 64-bit deadline, compared against the `time` CSR when `V=0` and against the -sum of the `time` and `htimedelta` CSRs when `V=1` (similar to `stimecmp` and -`vstimecmp` respectively), for the wait. On RV32 systems, a deadline cannot be -specified as the `rs1` is limited to 32-bit. +The `WRS.NTO` and `WRS.STO` instructions are available in all privilege modes +and use the `SYSTEM` major opcode. When these instructions are invoked, the +hart stalls until one of following events occur: -*Mnemonic:* -wrs _rs1_ - -*Pseudoinstructions:* -wrs -> wrs zero +. The reservation set is invalid +. If `WRS.STO`, an implementation defined short time has passed since the + hart was stalled. +. An interrupt was observed - even if disabled *Encoding:* [wavedrom, , ] @@ -20,100 +16,56 @@ wrs -> wrs zero {reg: [ {bits: 7, name: 'opcode', attr: ['SYSTEM(0x73)'] }, {bits: 5, name: 'rd', attr: ['0'] }, - {bits: 3, name: 'func3', attr: ['0'] }, - {bits: 5, name: 'rs1', attr: ['src'] }, - {bits: 12, name: 'func12', attr:['WRS(0x010)'] }, + {bits: 3, name: 'funct3', attr: ['0'] }, + {bits: 5, name: 'rs1', attr: ['0'] }, + {bits: 12, name: 'funct12', attr:['WRS.NTO(0x0d)', 'WRS.STO(0x1d)'] }, ], config:{lanes: 1, hspace:1024}} .... -The value of the field `opcode` is `0x73`, the value of the field `func3` is `0x0`, -and the value of the fields `funct12` and `rd` is `0x010`. -The field `rs1` encodes the register that holds the 64-bit deadline, or `0x0` -(encoding for `x0`) if no timeout is specified. -In RV32, the `rs1` field must be `0x0` else an illegal instruction exception occurs. *Operation:* [source,asciidoc, linenums] .... -1. if reservation-set is valid -2. Stall hart execution until one of following events occur: - a) reservation set is invalid - b) rs1 != 0 and X(rs1) < time CSR - c) interrupt observed -3. Invalidate reservation-set +if reservation-set is valid + Stall hart execution until one of following events occur: + a) reservation set is invalid + b) if WRS.STO, a short time since start of stall has elapsed + c) interrupt observed .... -When the instruction is invoked, the hart stalls until one of following -events occur: - -. The reservation set is invalid -. The time deadline (if specified) is reached, i.e. `X(rs1)` is less than the - value in `time` CSR. -. An interrupt was observed - even if disabled While stalled, an implementation is permitted to remove the stall and complete -execution occasionally for any reason. `WRS` is allowed to complete in a bounded -amount of time from when the condition to remove the stall occurs. `WRS` is not -supported in a constrained `LR`/`SC` loop. When `WRS` completes, the -reservation set of the hart is no longer valid. +execution occasionally for any reason. `WRS.NTO` and `WRS.STO` are allowed to +complete in a bounded amount of time from when the condition to remove the +stall occurs. These instructions are not supported in a constrained `LR`/`SC` loop. [NOTE] ==== -Architecture Comment: Specifying the maximum wait as an absolute time deadline -as compared to a relative timeout delay simplifies the programmer usage model. -If the application gets interrupted during the wait, the WRS may be re-executed -with the previously specified deadline without the risk of sleeping too long - -if the deadline has already passed, the `WRS` will release its stall. +Architecture Comment: `WRS.STO` and `WRS.NTO` are not defined as a hint but +as having a defined behavior. Implementing as a hint that can be ignored +(i.e., executed as the underlying nop) may lead to degradation in the system +and/or application performance. -Architecture Comment: The deadline, if specified, may be used by the hart as a -hint to determine if a lower power state may be achieved during the wait. -Entering a power-saving state may affect how fast the stalled hart may resume -execution when an event occurs. +Architecture Comment: Since the `WRS.STO` and `WRS.NTO` instructions can complete +execution for reasons other than writes to the reservation set, software will +likely need a means of looping until the required writes have occurred. -Architecture Comment: `WRS` is not defined as a hint but as having a defined -behavior. Implementing as a hint that can be ignored (i.e., executed as the -underlying nop) may lead to degradation in the system and/or application -performance. `WRS` is not suitable as a hint NOP due to a) long wait times -being possible b) `WRS` is coupled with a `LR` and if `WRS` were to be -implemented as a NOP then a valid hanging reservation would be left around -c) `WRS` explicitly makes the reservation set invalid and this behavior would -not occur if the instruction were implemented as a NOP. Combining `WRS`, -`PAUSE`, optional timeout, and HINT functionality all into one -jack-of-all-trades instruction, for differing software use cases, that may or -may not perform different actions based on different combinations of input -operands, is generally not encouraged within RISC-V. +Recommendation: An implementation should try to bound the short timeout to +be long enough to allow meaningful power reduction but short enough to avoid +a program using the timeout to meet a deadline from missing it significantly. +Bounding the short timeout to not more than 10 microseconds is recommended. Recommendation: An implementation should try to bound the latency to remove the stall to latency incurred on access to an on-chip cache furthest from the hart or in case of a cache-less system the access to main memory from the hart ==== +`WRS.NTO` and `WRS.STO` instructions follows rules of the existing `WFI` +instruction for resuming execution on a pending interrupt. -`WRS` instruction follows rules of the existing `WFI` instruction for resuming -execution on a pending interrupt. - -When not executing in M-mode, if the existing `TM` bit in `mcounteren` register -is clear (disabling timer access), `WRS` with `rs1` not `x0` will cause an -illegal instruction exception. - -When the existing `TW` (Timeout Wait) bit in `mstatus` is set and `WRS` is +When the existing `TW` (Timeout Wait) bit in `mstatus` is set and `WRS.NTO` is executed in S or U mode, and it does not complete within an -implementation-specific bounded time limit, the `WRS` instruction will cause an -illegal instruction exception. - -When executing in VS or VU mode, if the existing `TM` bit in the `hcounteren` -register is clear and the `TM` bit in the `mcounteren` register is set, the -`WRS` with `rs1` not `x0` will cause a virtual instruction exception. +implementation-specific bounded time limit, the `WRS.NTO` instruction will cause +an illegal instruction exception. When executing in VS or VU mode, if the existing `VTW` bit is set in `hstatus`, -the `mstatus` `TW` bit is clear, and the `WRS` does not complete within an -implementation-specific bounded time limit, the `WRS` instruction will cause a -virtual instruction exception. - -When `WRS` completes, reservations held by the executing hart are invalidated. -It is otherwise legal for an implementation to simply execute `WRS` without -stalling. - -[NOTE] -==== -Architecture Comment: Since the `WRS` instruction can complete execution for -reasons other than writes to the reservation set, software will likely need a -means of looping until the required writes have occured. -==== +the `mstatus` `TW` bit is clear, and the `WRS.NTO` does not complete within an +implementation-specific bounded time limit, the `WRS.NTO` instruction will +cause a virtual instruction exception.