Skip to content

Commit

Permalink
Push guarantees in throughout
Browse files Browse the repository at this point in the history
  • Loading branch information
anshumanmohan committed Jan 30, 2024
1 parent 58bdeae commit 49fc54c
Showing 1 changed file with 40 additions and 28 deletions.
68 changes: 40 additions & 28 deletions docs/lang/static.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This means that the compiler does not know, track, or guarantee the number of cy
This is in contrast to a *latency-sensitive*, or *static*, model of computation, where the number of cycles a component needs is known to, and honored by, the compiler.

In general, latency-insensitivity makes it easier to compose programs.
It grants the compiler freedom to schedule operators however it wants, as long as it meets the program's dataflow constraints.
It grants the compiler freedom to schedule operators however it wants, as long as the schedule meets the program's dataflow constraints.
It also prevents code from implicitly depending on the state of other code running in parallel.

However, there are two drawbacks to this approach.
Expand All @@ -15,27 +15,38 @@ this means that the use of black-box hardware designs requires costly handshakin

To address these issues, Calyx provides a `static` qualifier that modifies components and groups, along with static variants of other control operators.

Broadly, the `static` qualifier is a promise to the compiler that the component or group will take exactly the specified number of cycles to execute.
The compiler is free to take advantage of this promise to generate more efficient hardware.
In return, the compiler must access out-ports of static components only after the specified number of cycles have passed, or risk receiving incorrect results.

## Static Constructs in the Calyx IL

We will now discuss the static constructs available in the Calyx IL, along with the guarantees they come with.

### Static Components

Say we have a multiplier component, `std_mult`, which multiplies the values `left` and `right` and puts the result in `out`.
Its latency is 3 cycles.
We can declare it as follows:
Briefly consider a divider component, `std_div`, which divides the value `left` by the value `right` and puts the result in `out`.
This component is dynamic; its latency is unknown.
```
static<3> primitive std_mult[W](go: 1, left: W, right: W) -> (out: W);
primitive std_div[W](go: 1, left: W, right: W) -> (out: W, done: 1);
```
Compare this to the divider component `std_div`, whose latency is unknown:
A client of the divider must pass two inputs, raise the `go` signal, and wait for the component itself to raise its `done` signal.
The client can then read the result from the `out` port.

Compare this to a multiplier component, `std_mult`, which has a similar signature but whose latency is known to be three cycles.
We declare it as follows:
```
primitive std_div[W](go: 1, left: W, right: W) -> (out: W, done: 1)
static<3> primitive std_mult[W](go: 1, left: W, right: W) -> (out: W);
```

The key differences are:
- The `static` qualifier is used to declare the component as static and to specify its latency.
- The `done` port is not present in the static component.

A client of the divider must pass two inputs, raise the `go` signal, and wait for the component itself to raise its `done` signal.
In contrast, a client of the multiplier must pass two inputs and raise the `go` signal, but it does not need to wait for the component to raise a `done` signal.
A client of the multiplier must pass two inputs and raise the `go` signal as before.
However, the client need not then wait for the component to indicate completion.
It can simply and safely assume that the result will be available after 3 cycles.
This is a guarantee that the author of the component has made to the client, and the compiler is free to take advantage of it.


### Static Groups and Relative Timing Guards
Expand All @@ -44,7 +55,8 @@ Much like components, groups can be declared as static.
Since groups are just unordered sets of assignments, it pays to have a little more control over the scheduling of the assignments within a group.
To this end, static groups have a unique feature that ordinary dynamic groups do not: *relative timing guards*.

Consider this group, which performs `ans := 6 * 7`:
Consider this group, which multiplies `6` and `7` and stores the result in `ans`.

```
static<4> group mult_and_store {
mult.left = %[0:3] ? 6;
Expand All @@ -54,36 +66,36 @@ static<4> group mult_and_store {
ans.write_en = %3 ? 1;
}
```
The `static<4>` keyword specifies that the group should take 4 cycles to execute.
The `static<4>` qualifier specifies that the group should take 4 cycles to execute.

The first three assignments are guarded (using the standard `?` separator) by the relative timing guard `%[0:3]`.
In general, a relative timing guard `%[i:j]` is *true* in the half-open interval from cycle `i` to
cycle `j` of the group’s execution and *false* otherwise.

In our case, the first three assignments execute only in the first three cycles of the group's execution.
The guard `%3`, which we see thereafter, is syntactic sugar for `%[3:4]`.
The guard `%3`, which we see immediately afterwards, is syntactic sugar for `%[3:4]`.
We have used it in this case to ensure that the last two assignments execute only in the last cycle of the group's execution.


### Static Control Operators

Calyx provides static variants of each of its control operators.
While dynamic commands may contain both static and dynamic children, static commands must only have static children.

- `static seq` is a static version of `seq`; its latency is the sum of the latencies of its children.
- `static par` is a static version of `par`; its latency is the maximum of the latencies of its children.
In the examples below, assume that `A5`, `B6`, `C7`, and `D8` are static groups with latencies 5, 6, 7, and 8, respectively.

- `static seq` is a static version of `seq`.
If we have `static seq { A5; B6; C7; D8; }`, we can guarantee that the latency of the entire operation is the sum of the latencies of its children: 5 + 6 + 7 + 8 = 26 cycles in this case.
We can also guarantee that, each child will begin executing exactly one cycle after the previous child has finished.
In our case, for example, `B6` will begin executing exactly one cycle after `A5` has finished.
- `static par` is a static version of `par`.
If we have `static par { A5; B6; C7; D8; }`, we can guarantee that the latency of the entire operation is the maximum of the latencies of its children: 8 cycles in this case.
Further, all the children of a `static par` are guaranteed to begin executing at the same time.
The children can rely on this "lockstep" behavior and can communicate with each other.
Such communication is undefined behavior in a dynamic `par`.
- `static if` is a static version of `if`; its latency is the maximum of the latencies of its children.
- Calyx's `while` loop is unbouded, so it does not have a static variant.
For example, `static if { A5; B6; }` has a latency of 6 cycles.
- Calyx's `while` loop is unbouded and so it does not have a static variant.
- `static repeat` is a static version of `repeat`; its latency is the product of the number of iterations and the latency of its child.
- `static invoke` is a static version of `invoke`; its latency is the latency of the invoked cell.

## Guarantees

The `static` keyword is a promise to the compiler that the component or group will take exactly the specified number of cycles to execute.
The compiler is free to take advantage of this promise to generate more efficient hardware.
In return, the compiler must access out-ports of static components only after the specified number of cycles have passed, or risk receiving incorrect results.

There are other guarantees associated with individual static constructs:
- A child of `static seq` is guaranteed to begin executing exactly one cycle after the previous child has finished.
- All the children of a `static par` are guaranteed to begin executing at the same time.
- The body of a `static repeat` is guaranteed to begin executing exactly one cycle after the previous iteration has finished.
For example, `static repeat 7 { B6; }` has a latency of 42 cycles.
The body of a `static repeat` is guaranteed to begin executing exactly one cycle after the previous iteration has finished.
- `static invoke` is a static version of `invoke`; its latency is the latency of the invoked cell.

0 comments on commit 49fc54c

Please sign in to comment.