Skip to content

Commit

Permalink
Add FP8alt, low and mixed-precision SDOTP with stochastic rounding su…
Browse files Browse the repository at this point in the history
…pport, and compressed vector cmp (pulp-platform#3)

Added support for:
- FP8alt (1, 4, 3)
- low and mixed-precision SDOTP with stochastic rounding support
- compressed vector compare results (one bit per comparison in the LSBs)

---------

Co-authored-by: Gianna Paulin <[email protected]>
  • Loading branch information
lucabertaccini and GiannaP authored May 4, 2023
1 parent 16c1d2f commit 3b1f7af
Show file tree
Hide file tree
Showing 18 changed files with 2,464 additions and 114 deletions.
3 changes: 3 additions & 0 deletions Bender.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,12 @@ sources:
- src/fpnew_divsqrt_multi.sv
- src/fpnew_fma.sv
- src/fpnew_fma_multi.sv
- src/fpnew_sdotp_multi.sv
- src/fpnew_sdotp_multi_wrapper.sv
- src/fpnew_noncomp.sv
- src/fpnew_opgroup_block.sv
- src/fpnew_opgroup_fmt_slice.sv
- src/fpnew_opgroup_multifmt_slice.sv
- src/fpnew_rounding.sv
- src/lfsr_sr.sv
- src/fpnew_top.sv
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,23 @@ If you use FPnew in your work, you can cite us:
}
```

If you use FPnew SDOTP in your work, you can cite us:

<details>
<summary>SDOTP Publication</summary>
<p>

```
@inproceedings{bertaccini2022minifloat,
title={MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V Cores},
author={Bertaccini, Luca and Paulin, Gianna and Fischer, Tim and Mach, Stefan and Benini, Luca},
booktitle={2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)},
pages={1--8},
year={2022},
organization={IEEE}
}
```

</p>
</details>

Expand Down
15 changes: 15 additions & 0 deletions docs/CHANGELOG-PULP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

In this sense, we interpret the "Public API" of a hardware module as its port/parameter list.
Versions of the IP in the same major relase are "pin-compatible" with each other. Minor relases are permitted to add new parameters as long as their default bindings ensure backwards compatibility.

## [0.1.0] - 2023-05-04

### Added
- Add low and mixed-precision SDOTP with support for stochastic rounding
- Add `FP8alt (1,4,3)` format
- Add support for compressed vector compare results (one bit per comparison in the LSBs)
5 changes: 5 additions & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ Versions of the IP in the same major relase are "pin-compatible" with each other

## [Unreleased]

### Added
- Add support for alternative FP32-only DivSqrt unit

## [0.7.0] - 2023-03-20

### Added
- Citation file `CITATION.cff`
- Add support for RISC-V compliant classify in vectorial mode when the vector element width is at least 10 bits
Expand Down
71 changes: 53 additions & 18 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ For more in-depth explanations on how to configure the unit and the layout of th
| `TagType` | The SystemVerilog data type of the operation tag |
| `TrueSIMDClass` | If enabled, the result of a classify operation in vectorial mode will be RISC-V compliant if each output has at least 10 bits|
| `EnableSIMDMask` | Enable the RISC-V floating-point status flags masking of inactive vectorial lanes. When disabled, `simd_mask_i` is inactive |
| `StochasticRndImplementation` | Enable stochastic rounding support for SDOTP, define LFSR bitwidth and number of trailing bits considered for the SR decision |
| `CompressedVecCmpResult` | Compress the result of a vector compare in the LSBs, conceived for RV32FD cores |

### Ports

Expand All @@ -50,6 +52,7 @@ As the width of some input/output signals is defined by the configuration, it is
|------------------|-----------|----------------------|----------------------------------------------------------------|
| `clk_i` | in | `logic` | Clock, synchronous, rising-edge triggered |
| `rst_ni` | in | `logic` | Asynchronous reset, active low |
| `hart_id_i` | in | `logic [31:0]` | Core ID, used only when stochastic rounding is enabled |
| `operands_i` | in | `logic [2:0][W-1:0]` | Operands, henceforth referred to as `op[`*i*`]` |
| `rnd_mode_i` | in | `roundmode_e` | Floating-point rounding mode |
| `op_i` | in | `operation_e` | Operation select |
Expand Down Expand Up @@ -79,15 +82,16 @@ Default values from the package are listed.

Enumeration of type `logic [2:0]` holding available rounding modes, encoded for use in RISC-V cores:

| Enumerator | Value | Rounding Mode |
|------------|----------|------------------------------------------------------|
| `RNE` | `3'b000` | To nearest, tie to even (default) |
| `RTZ` | `3'b001` | Toward zero |
| `RDN` | `3'b010` | Toward negative infinity |
| `RUP` | `3'b011` | Toward positive infinity |
| `RMM` | `3'b100` | To nearest, tie away from zero |
| `ROD` | `3'b101` | To odd |
| `DYN` | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations* |
| Enumerator | Value | Rounding Mode |
|------------|----------|----------------------------------------------------------|
| `RNE` | `3'b000` | To nearest, tie to even (default) |
| `RTZ` | `3'b001` | Toward zero |
| `RDN` | `3'b010` | Toward negative infinity |
| `RUP` | `3'b011` | Toward positive infinity |
| `RMM` | `3'b100` | To nearest, tie away from zero |
| `ROD` | `3'b101` | To odd |
| `RSR` | `3'b110` | Stochastic Rounding (available only on SDOTP operations) |
| `DYN` | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations* |

##### `operation_e` - FP Operation

Expand All @@ -104,6 +108,8 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
| `ADD` | `0` | Addition (`op[1] + op[2]`) *note the operand indices* |
| `ADD` | `1` | Subtraction (`op[1] - op[2]`) *note the operand indices* |
| `MUL` | `0` | Multiplication (`op[0] * op[1]`) |
| `SDOTP` | `0` | Sum of dot product ) |
| `VSUM` | `0` | Vector Inner Sum ) |
| `DIV` | `0` | Division (`op[0] / op[1]`) |
| `SQRT` | `0` | Square root |
| `SGNJ` | `0` | Sign injection, operation encoded in rounding mode<br>`RNE`: `op[0]` with `sign(op[1])`<br>`RTZ`: `op[0]` with `~sign(op[1])`<br>`RDN`: `op[0]` with `sign(op[0]) ^ sign(op[1])`<br>`RUP`: `op[0]` (passthrough) |
Expand Down Expand Up @@ -132,10 +138,11 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
| `FP16` | IEEE binary16 | 16 bit | 5 | 10 |
| `FP8` | binary8 | 8 bit | 5 | 2 |
| `FP16ALT` | binary16alt | 16 bit | 8 | 7 |
| `FP8ALT` | binary8alt | 8 bit | 4 | 3 |

The following global parameters associated with FP formats are set in `fpnew_pkg`:
```SystemVerilog
localparam int unsigned NUM_FP_FORMATS = 5;
localparam int unsigned NUM_FP_FORMATS = 6;
localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
```

Expand Down Expand Up @@ -230,7 +237,7 @@ typedef struct packed {
```
The fields of this struct behave as follows:

##### `Width` - Datapath Wdith
##### `Width` - Datapath Width

Specifies the width of the FPU datapath and of the input and output data ports (`operands_i`/`result_o`).
It must be larger or equal to the width of the widest enabled FP and integer format.
Expand Down Expand Up @@ -278,7 +285,7 @@ Otherwise, synthesis tools can optimize away any logic associated with this form

#### `Implementation` - Implementation Options

The FPU is divided into four operation groups, `ADDMUL`, `DIVSQRT`, `NONDOMP`, and `CONV` (see [Architecture: Top-Level](#top-level)).
The FPU is divided into five operation groups, `ADDMUL`, `DIVSQRT`, `NONDOMP`, `CONV`, and `DOTP` (see [Architecture: Top-Level](#top-level)).
The `Implementation` parameter controls the implementation of these operation groups.
It is of type `fpu_implementation_t` which is defined as:
```SystemVerilog
Expand Down Expand Up @@ -320,17 +327,18 @@ The unit type `unit_type_t` is an enumeration of type `logic [1:0]` holding the
The `UnitTypes` parameter allows to control resources used for the FPU by either removing operation units for certain formats and operations, or merging multiple formats into one.
Currently, the follwoing unit types are available for the FPU operation groups:

| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` |
|------------|--------------------|--------------------|--------------------|--------------------|
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | |
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: |
| | `ADDMUL` | `DIVSQRT` | `NONCOMP` | `CONV` | `DOTP` |
|------------|--------------------|--------------------|--------------------|--------------------|--------------------|
| `PARALLEL` | :heavy_check_mark: | | :heavy_check_mark: | | |
| `MERGED` | :heavy_check_mark: | :heavy_check_mark: | | :heavy_check_mark: | :heavy_check_mark: |

*Default*:
```SystemVerilog
'{'{default: PARALLEL}, // ADDMUL
'{default: MERGED}, // DIVSQRT
'{default: PARALLEL}, // NONCOMP
'{default: MERGED}} // CONV`
'{default: MERGED}, // CONV`
'{default: DISABLED}} // DOTP`
```
(all formats within operation group use same type)

Expand All @@ -350,7 +358,33 @@ The configuration `pipe_config_t` is an enumeration of type `logic [1:0]` holdi
| `INSIDE` | All registers are inserted at roughly the middle of the operational unit (if not possible, `BEFORE`) |
| `DISTRIBUTED` | Registers are evenly distributed to `INSIDE`, `BEFORE`, and `AFTER` (if no `INSIDE`, all `BEFORE`) |

### `Stochastic Rounding Implementation`

The `StochasticRndImplementation` parameter is used to configure the RSR support.
It is of type `rsr_impl_t` which is defined as:
```SystemVerilog
typedef struct packed {
logic EnableRSR;
int unsigned RsrPrecision;
int unsigned LfsrInternalPrecision;
} rsr_impl_t;
```
The fields of this struct behave as follows:

##### `EnableRSR` - Enable RSR support
Enables stochastic rounding support in the `DOTP` operation group block. It instantiates an `LFSR` in the rounding module.

*Default*: `1'b0`

##### `RsrPrecision`
Specifies the number of trailing bits considered for the stochastic rounding decision.

*Default*: `12`

##### `LfsrInternalPrecision`
Specifies the LFSR internal bitwidth, thus controlling the pseudorandom number periodicity.

*Default*: `32`

### Adding Custom Formats

Expand Down Expand Up @@ -391,14 +425,15 @@ The *operation group* is the highest level of grouping within FPnew and signifie

![FPnew](fig/top_block.png)

There are currently four operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
There are currently five operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:

| Enumerator | Description | Associated Operations |
|------------|-----------------------------------------------|---------------------------------------|
| `ADDMUL` | Addition and Multiplication | `FMADD`, `FNMSUB`, `ADD`, `MUL` |
| `DIVSQRT` | Division and Square Root | `DIV`, `SQRT` |
| `NONCOMP` | Non-Computational Operations like Comparisons | `SGNJ`, `MINMAX`, `CMP`, `CLASS` |
| `CONV` | Conversions | `F2I`, `I2F`, `F2F`, `CPKAB`, `CPKCD` |
| `DOTP` | Dot Products | `SDOTP`, `EXVSUM`, `VSUM` |

Most architectural decisions for FPnew are made at very fine granularity.
The big exception to this is the generation of vectorial hardware which is decided at top level through the `EnableVectors` parameter.
Expand Down
8 changes: 7 additions & 1 deletion src/fpnew_cast_multi.sv
Original file line number Diff line number Diff line change
Expand Up @@ -544,11 +544,17 @@ module fpnew_cast_multi #(
assign pre_round_abs = dst_is_int_q ? ifmt_pre_round_abs[int_fmt_q2] : fmt_pre_round_abs[dst_fmt_q2];

fpnew_rounding #(
.AbsWidth ( WIDTH )
.AbsWidth ( WIDTH ),
.EnableRSR ( 0 )
) i_fpnew_rounding (
.clk_i,
.rst_ni,
.id_i ( '0 ),
.en_rsr_i ( 1'b0 ),
.abs_value_i ( pre_round_abs ),
.sign_i ( input_sign_q ), // source format
.round_sticky_bits_i ( round_sticky_bits ),
.stochastic_rounding_bits_i ( '0 ),
.rnd_mode_i ( rnd_mode_q ),
.effective_subtraction_i ( 1'b0 ), // no operation happened
.abs_rounded_o ( rounded_abs ),
Expand Down
8 changes: 7 additions & 1 deletion src/fpnew_fma.sv
Original file line number Diff line number Diff line change
Expand Up @@ -597,11 +597,17 @@ module fpnew_fma #(

// Perform the rounding
fpnew_rounding #(
.AbsWidth ( EXP_BITS + MAN_BITS )
.AbsWidth ( EXP_BITS + MAN_BITS ),
.EnableRSR ( 0 )
) i_fpnew_rounding (
.clk_i,
.rst_ni,
.id_i ( '0 ),
.en_rsr_i ( 1'b0 ),
.abs_value_i ( pre_round_abs ),
.sign_i ( pre_round_sign ),
.round_sticky_bits_i ( round_sticky_bits ),
.stochastic_rounding_bits_i ( '0 ),
.rnd_mode_i ( rnd_mode_q ),
.effective_subtraction_i ( effective_subtraction_q ),
.abs_rounded_o ( rounded_abs ),
Expand Down
8 changes: 7 additions & 1 deletion src/fpnew_fma_multi.sv
Original file line number Diff line number Diff line change
Expand Up @@ -720,11 +720,17 @@ module fpnew_fma_multi #(

// Perform the rounding
fpnew_rounding #(
.AbsWidth ( SUPER_EXP_BITS + SUPER_MAN_BITS )
.AbsWidth ( SUPER_EXP_BITS + SUPER_MAN_BITS ),
.EnableRSR ( 0 )
) i_fpnew_rounding (
.clk_i,
.rst_ni,
.id_i ( '0 ),
.en_rsr_i ( 1'b0 ),
.abs_value_i ( pre_round_abs ),
.sign_i ( pre_round_sign ),
.round_sticky_bits_i ( round_sticky_bits ),
.stochastic_rounding_bits_i ( '0 ),
.rnd_mode_i ( rnd_mode_q ),
.effective_subtraction_i ( effective_subtraction_q ),
.abs_rounded_o ( rounded_abs ),
Expand Down
10 changes: 8 additions & 2 deletions src/fpnew_opgroup_block.sv
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ module fpnew_opgroup_block #(
parameter fpnew_pkg::pipe_config_t PipeConfig = fpnew_pkg::BEFORE,
parameter type TagType = logic,
parameter int unsigned TrueSIMDClass = 0,
parameter logic CompressedVecCmpResult = 0,
parameter fpnew_pkg::rsr_impl_t StochasticRndImplementation = fpnew_pkg::DEFAULT_NO_RSR,
// Do not change
localparam int unsigned NUM_FORMATS = fpnew_pkg::NUM_FP_FORMATS,
localparam int unsigned NUM_OPERANDS = fpnew_pkg::num_operands(OpGroup),
Expand All @@ -34,6 +36,7 @@ module fpnew_opgroup_block #(
) (
input logic clk_i,
input logic rst_ni,
input logic [31:0] hart_id_i,
// Input signals
input logic [NUM_OPERANDS-1:0][Width-1:0] operands_i,
input logic [NUM_FORMATS-1:0][NUM_OPERANDS-1:0] is_boxed_i,
Expand Down Expand Up @@ -110,7 +113,8 @@ module fpnew_opgroup_block #(
.NumPipeRegs ( FmtPipeRegs[fmt] ),
.PipeConfig ( PipeConfig ),
.TagType ( TagType ),
.TrueSIMDClass ( TrueSIMDClass )
.TrueSIMDClass ( TrueSIMDClass ),
.CompressedVecCmpResult ( CompressedVecCmpResult )
) i_fmt_slice (
.clk_i,
.rst_ni,
Expand Down Expand Up @@ -182,10 +186,12 @@ module fpnew_opgroup_block #(
.PulpDivsqrt ( PulpDivsqrt ),
.NumPipeRegs ( REG ),
.PipeConfig ( PipeConfig ),
.TagType ( TagType )
.TagType ( TagType ),
.StochasticRndImplementation ( StochasticRndImplementation )
) i_multifmt_slice (
.clk_i,
.rst_ni,
.hart_id_i,
.operands_i,
.is_boxed_i,
.rnd_mode_i,
Expand Down
Loading

0 comments on commit 3b1f7af

Please sign in to comment.