Skip to content
This repository has been archived by the owner on Feb 20, 2024. It is now read-only.

Commit

Permalink
updating following ARC review feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
ved-rivos committed Jun 2, 2022
1 parent 1c84087 commit 2989492
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 103 deletions.
33 changes: 15 additions & 18 deletions intro.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[[Introduction]]
== Introduction
The Zawrs extension defines a single instruction to be used in polling loops
The Zawrs extension defines a pair of instructions to be used in polling loops
that allows a core to enter a low-power state and wait on either a write to a
memory location or other asynchronous system events. It addresses common
use-cases in operating systems when waiting for contended locks or for
Expand All @@ -24,13 +24,20 @@ such as:

Such usages involve polling on memory locations, and such busy loops can be a
wasteful expenditure of energy. To mitigate the wasteful looping in such usages,
a `WRS` instruction is proposed. Instead of polling for a write at a specific
memory location, software would add that memory location to the reservation set
using the existing `LR` instruction - a subsequent `WRS` instruction would
cause the hart to stall until a write occurs to the reservation set. On RV64
systems, this proposal allows software to specify (in `rs1` operand) a deadline
as a time value in the future when execution would resume in absence of a store
to the reservation set.
a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling
for a write at a specific memory location, software would add that memory
location to the reservation set using the existing `LR` instruction - a
subsequent `WRS.NTO` instruction would cause the hart to stall until a write
occurs to the reservation set.

Sometimes the program waiting on a memory update may also need to carry out a
task at a future time (e.g., generating a heartbeat, etc.) or otherwise place
an upper bound on the wait. To support such usages a second instruction
`WRS.STO` (WRS-with-a-short-timeout) is provided that works like `WRS.NTO` but
bounds the stall duration to a short timeout such that the stall is removed
on the timeout if no other conditions to remove the stall have occurred. The
program using this instruction may then determine if its next deadline has
has been reached.

Such mechanisms have been demonstrated to help achieve increased efficiency
both in terms of instruction count as well as cycle count and thereby lead to
Expand All @@ -39,13 +46,3 @@ cite:[osti_1376625], cite:[dpdk_2019] and cite:[usenix_196292]. In such
applications, optionally bounding the wait for the event to occur with a
timeout is commonly used cite:[sigplan_2001], cite:[Franke2005FussF],
cite:[pthread_man], and cite:[futex_man].

=== Suitability for Fast Track Extension Process
This proposed extension meets the Fast Track criteria: it consists of a single
instruction, it addresses the need to be able to efficiently wait on memory
updates which is common across the whole gamut of use cases, it fits in well
with other instructions, and is not expected to be contentious. This extension
helps RISC-V catch up with established capabilities in other architectures
(UMONITOR/UMWAIT - x86-Intel; WFET - ARM, MWAITX- x86-AMD) to address existing
software use cases and is not inventing new use cases.

122 changes: 37 additions & 85 deletions zawrs.adoc
Original file line number Diff line number Diff line change
@@ -1,119 +1,71 @@
[[Zawrs]]
== Zawrs

The `WRS` instruction is available in all privilege modes and is uses the
`SYSTEM` major opcode. On RV64 systems, if `rs1` is not `x0` then `rs1` holds
a 64-bit deadline, compared against the `time` CSR when `V=0` and against the
sum of the `time` and `htimedelta` CSRs when `V=1` (similar to `stimecmp` and
`vstimecmp` respectively), for the wait. On RV32 systems, a deadline cannot be
specified as the `rs1` is limited to 32-bit.
The `WRS.NTO` and `WRS.STO` instructions are available in all privilege modes
and use the `SYSTEM` major opcode. When these instructions are invoked, the
hart stalls until one of following events occur:

*Mnemonic:*
wrs _rs1_

*Pseudoinstructions:*
wrs -> wrs zero
. The reservation set is invalid
. If `WRS.STO`, an implementation defined short time has passed since the
hart was stalled.
. An interrupt was observed - even if disabled

*Encoding:*
[wavedrom, , ]
....
{reg: [
{bits: 7, name: 'opcode', attr: ['SYSTEM(0x73)'] },
{bits: 5, name: 'rd', attr: ['0'] },
{bits: 3, name: 'func3', attr: ['0'] },
{bits: 5, name: 'rs1', attr: ['src'] },
{bits: 12, name: 'func12', attr:['WRS(0x010)'] },
{bits: 3, name: 'funct3', attr: ['0'] },
{bits: 5, name: 'rs1', attr: ['0'] },
{bits: 12, name: 'funct12', attr:['WRS.NTO(0x0d)', 'WRS.STO(0x1d)'] },
], config:{lanes: 1, hspace:1024}}
....
The value of the field `opcode` is `0x73`, the value of the field `func3` is `0x0`,
and the value of the fields `funct12` and `rd` is `0x010`.
The field `rs1` encodes the register that holds the 64-bit deadline, or `0x0`
(encoding for `x0`) if no timeout is specified.
In RV32, the `rs1` field must be `0x0` else an illegal instruction exception occurs.

*Operation:*
[source,asciidoc, linenums]
....
1. if reservation-set is valid
2. Stall hart execution until one of following events occur:
a) reservation set is invalid
b) rs1 != 0 and X(rs1) < time CSR
c) interrupt observed
3. Invalidate reservation-set
if reservation-set is valid
Stall hart execution until one of following events occur:
a) reservation set is invalid
b) if WRS.STO, a short time since start of stall has elapsed
c) interrupt observed
....
When the instruction is invoked, the hart stalls until one of following
events occur:

. The reservation set is invalid
. The time deadline (if specified) is reached, i.e. `X(rs1)` is less than the
value in `time` CSR.
. An interrupt was observed - even if disabled

While stalled, an implementation is permitted to remove the stall and complete
execution occasionally for any reason. `WRS` is allowed to complete in a bounded
amount of time from when the condition to remove the stall occurs. `WRS` is not
supported in a constrained `LR`/`SC` loop. When `WRS` completes, the
reservation set of the hart is no longer valid.
execution occasionally for any reason. `WRS.NTO` and `WRS.STO` are allowed to
complete in a bounded amount of time from when the condition to remove the
stall occurs. These instructions are not supported in a constrained `LR`/`SC` loop.

[NOTE]
====
Architecture Comment: Specifying the maximum wait as an absolute time deadline
as compared to a relative timeout delay simplifies the programmer usage model.
If the application gets interrupted during the wait, the WRS may be re-executed
with the previously specified deadline without the risk of sleeping too long -
if the deadline has already passed, the `WRS` will release its stall.
Architecture Comment: `WRS.STO` and `WRS.NTO` are not defined as a hint but
as having a defined behavior. Implementing as a hint that can be ignored
(i.e., executed as the underlying nop) may lead to degradation in the system
and/or application performance.
Architecture Comment: The deadline, if specified, may be used by the hart as a
hint to determine if a lower power state may be achieved during the wait.
Entering a power-saving state may affect how fast the stalled hart may resume
execution when an event occurs.
Architecture Comment: Since the `WRS.STO` and `WRS.NTO` instructions can complete
execution for reasons other than writes to the reservation set, software will
likely need a means of looping until the required writes have occurred.
Architecture Comment: `WRS` is not defined as a hint but as having a defined
behavior. Implementing as a hint that can be ignored (i.e., executed as the
underlying nop) may lead to degradation in the system and/or application
performance. `WRS` is not suitable as a hint NOP due to a) long wait times
being possible b) `WRS` is coupled with a `LR` and if `WRS` were to be
implemented as a NOP then a valid hanging reservation would be left around
c) `WRS` explicitly makes the reservation set invalid and this behavior would
not occur if the instruction were implemented as a NOP. Combining `WRS`,
`PAUSE`, optional timeout, and HINT functionality all into one
jack-of-all-trades instruction, for differing software use cases, that may or
may not perform different actions based on different combinations of input
operands, is generally not encouraged within RISC-V.
Recommendation: An implementation should try to bound the short timeout to
be long enough to allow meaningful power reduction but short enough to avoid
a program using the timeout to meet a deadline from missing it significantly.
Bounding the short timeout to not more than 10 microseconds is recommended.
Recommendation: An implementation should try to bound the latency to remove the
stall to latency incurred on access to an on-chip cache furthest from the hart
or in case of a cache-less system the access to main memory from the hart
====
`WRS.NTO` and `WRS.STO` instructions follows rules of the existing `WFI`
instruction for resuming execution on a pending interrupt.

`WRS` instruction follows rules of the existing `WFI` instruction for resuming
execution on a pending interrupt.

When not executing in M-mode, if the existing `TM` bit in `mcounteren` register
is clear (disabling timer access), `WRS` with `rs1` not `x0` will cause an
illegal instruction exception.

When the existing `TW` (Timeout Wait) bit in `mstatus` is set and `WRS` is
When the existing `TW` (Timeout Wait) bit in `mstatus` is set and `WRS.NTO` is
executed in S or U mode, and it does not complete within an
implementation-specific bounded time limit, the `WRS` instruction will cause an
illegal instruction exception.

When executing in VS or VU mode, if the existing `TM` bit in the `hcounteren`
register is clear and the `TM` bit in the `mcounteren` register is set, the
`WRS` with `rs1` not `x0` will cause a virtual instruction exception.
implementation-specific bounded time limit, the `WRS.NTO` instruction will cause
an illegal instruction exception.

When executing in VS or VU mode, if the existing `VTW` bit is set in `hstatus`,
the `mstatus` `TW` bit is clear, and the `WRS` does not complete within an
implementation-specific bounded time limit, the `WRS` instruction will cause a
virtual instruction exception.

When `WRS` completes, reservations held by the executing hart are invalidated.
It is otherwise legal for an implementation to simply execute `WRS` without
stalling.

[NOTE]
====
Architecture Comment: Since the `WRS` instruction can complete execution for
reasons other than writes to the reservation set, software will likely need a
means of looping until the required writes have occured.
====
the `mstatus` `TW` bit is clear, and the `WRS.NTO` does not complete within an
implementation-specific bounded time limit, the `WRS.NTO` instruction will
cause a virtual instruction exception.

0 comments on commit 2989492

Please sign in to comment.