Add FP8alt, low and mixed-precision SDOTP with stochastic rounding su…

…pport, and compressed vector cmp (pulp-platform#3) Added support for: - FP8alt (1, 4, 3) - low and mixed-precision SDOTP with stochastic rounding support - compressed vector compare results (one bit per comparison in the LSBs) --------- Co-authored-by: Gianna Paulin <[email protected]>
Lynx005F · May 4, 2023 · 3b1f7af · 3b1f7af
1 parent 16c1d2f
commit 3b1f7af
Show file tree

Hide file tree

Showing 18 changed files with 2,464 additions and 114 deletions.
diff --git a/Bender.yml b/Bender.yml
@@ -29,9 +29,12 @@ sources:
   - src/fpnew_divsqrt_multi.sv
   - src/fpnew_fma.sv
   - src/fpnew_fma_multi.sv
+  - src/fpnew_sdotp_multi.sv
+  - src/fpnew_sdotp_multi_wrapper.sv
   - src/fpnew_noncomp.sv
   - src/fpnew_opgroup_block.sv
   - src/fpnew_opgroup_fmt_slice.sv
   - src/fpnew_opgroup_multifmt_slice.sv
   - src/fpnew_rounding.sv
+  - src/lfsr_sr.sv
   - src/fpnew_top.sv
diff --git a/README.md b/README.md
@@ -165,6 +165,23 @@ If you use FPnew in your work, you can cite us:
 }
 ```
 
+If you use FPnew SDOTP in your work, you can cite us:
+
+<details>
+<summary>SDOTP Publication</summary>
+<p>
+
+```
+@inproceedings{bertaccini2022minifloat,
+  title={MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V Cores},
+  author={Bertaccini, Luca and Paulin, Gianna and Fischer, Tim and Mach, Stefan and Benini, Luca},
+  booktitle={2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)},
+  pages={1--8},
+  year={2022},
+  organization={IEEE}
+}
+```
+
 </p>
 </details>
 

diff --git a/docs/CHANGELOG-PULP.md b/docs/CHANGELOG-PULP.md
@@ -0,0 +1,15 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+
+In this sense, we interpret the "Public API" of a hardware module as its port/parameter list.
+Versions of the IP in the same major relase are "pin-compatible" with each other. Minor relases are permitted to add new parameters as long as their default bindings ensure backwards compatibility.
+
+## [0.1.0] - 2023-05-04
+
+### Added
+- Add low and mixed-precision SDOTP with support for stochastic rounding
+- Add `FP8alt (1,4,3)` format
+- Add support for compressed vector compare results (one bit per comparison in the LSBs)
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -10,6 +10,11 @@ Versions of the IP in the same major relase are "pin-compatible" with each other
 
 ## [Unreleased]
 
+### Added
+- Add support for alternative FP32-only DivSqrt unit
+
+## [0.7.0] - 2023-03-20
+
 ### Added
 - Citation file `CITATION.cff`
 - Add support for RISC-V compliant classify in vectorial mode when the vector element width is at least 10 bits

diff --git a/docs/README.md b/docs/README.md
@@ -40,6 +40,8 @@ For more in-depth explanations on how to configure the unit and the layout of th
 | `TagType`        | The SystemVerilog data type of the operation tag                                                                             |
 | `TrueSIMDClass`  | If enabled, the result of a classify operation in vectorial mode will be RISC-V compliant if each output has at least 10 bits|
 | `EnableSIMDMask` | Enable the RISC-V floating-point status flags masking of inactive vectorial lanes. When disabled, `simd_mask_i` is inactive  |
+| `StochasticRndImplementation` | Enable stochastic rounding support for SDOTP, define LFSR bitwidth and number of trailing bits considered for the SR decision  |
+| `CompressedVecCmpResult` | Compress the result of a vector compare in the LSBs, conceived for RV32FD cores                                      |
 
 ### Ports
 
@@ -50,6 +52,7 @@ As the width of some input/output signals is defined by the configuration, it is
 |------------------|-----------|----------------------|----------------------------------------------------------------|
 | `clk_i`          | in        | `logic`              | Clock, synchronous, rising-edge triggered                      |
 | `rst_ni`         | in        | `logic`              | Asynchronous reset, active low                                 |
+| `hart_id_i`      | in        | `logic [31:0]`       | Core ID, used only when stochastic rounding is enabled         |
 | `operands_i`     | in        | `logic [2:0][W-1:0]` | Operands, henceforth referred to as `op[`*i*`]`                |
 | `rnd_mode_i`     | in        | `roundmode_e`        | Floating-point rounding mode                                   |
 | `op_i`           | in        | `operation_e`        | Operation select                                               |
@@ -79,15 +82,16 @@ Default values from the package are listed.
 
 Enumeration of type `logic [2:0]` holding available rounding modes, encoded for use in RISC-V cores:
 
-| Enumerator |  Value   |                    Rounding Mode                     |
-|------------|----------|------------------------------------------------------|
-| `RNE`      | `3'b000` | To nearest, tie to even (default)                    |
-| `RTZ`      | `3'b001` | Toward zero                                          |
-| `RDN`      | `3'b010` | Toward negative infinity                             |
-| `RUP`      | `3'b011` | Toward positive infinity                             |
-| `RMM`      | `3'b100` | To nearest, tie away from zero                       |
-| `ROD`      | `3'b101` | To odd                                               |
-| `DYN`      | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations* |
+| Enumerator |  Value   |                    Rounding Mode                         |
+|------------|----------|----------------------------------------------------------|
+| `RNE`      | `3'b000` | To nearest, tie to even (default)                        |
+| `RTZ`      | `3'b001` | Toward zero                                              |
+| `RDN`      | `3'b010` | Toward negative infinity                                 |
+| `RUP`      | `3'b011` | Toward positive infinity                                 |
+| `RMM`      | `3'b100` | To nearest, tie away from zero                           |
+| `ROD`      | `3'b101` | To odd                                                   |
+| `RSR`      | `3'b110` | Stochastic Rounding (available only on SDOTP operations) |
+| `DYN`      | `3'b111` | *RISC-V Dynamic RM, invalid if passed to operations*     |
 
 ##### `operation_e` - FP Operation
 
@@ -104,6 +108,8 @@ Unless noted otherwise, the first operand `op[0]` is used for the operation.
 | `ADD`      | `0`      | Addition (`op[1] + op[2]`) *note the operand indices*                                                                                                                                                            |
 | `ADD`      | `1`      | Subtraction (`op[1] - op[2]`) *note the operand indices*                                                                                                                                                         |
 | `MUL`      | `0`      | Multiplication (`op[0] * op[1]`)                                                                                                                                                                                 |
+| `SDOTP`    | `0`      | Sum of dot product )                                                                                                                                                                                 |
+| `VSUM`     | `0`      | Vector Inner Sum )                                                                                                                                                                                 |
 | `DIV`      | `0`      | Division (`op[0] / op[1]`)                                                                                                                                                                                       |
 | `SQRT`     | `0`      | Square root                                                                                                                                                                                                      |
 | `SGNJ`     | `0`      | Sign injection, operation encoded in rounding mode<br>`RNE`: `op[0]` with `sign(op[1])`<br>`RTZ`: `op[0]` with `~sign(op[1])`<br>`RDN`: `op[0]` with `sign(op[0]) ^ sign(op[1])`<br>`RUP`: `op[0]` (passthrough) |
@@ -132,10 +138,11 @@ Enumeration of type `logic [2:0]` holding the supported FP formats.
 | `FP16`     | IEEE binary16 | 16 bit | 5         | 10        |
 | `FP8`      | binary8       | 8 bit  | 5         | 2         |
 | `FP16ALT`  | binary16alt   | 16 bit | 8         | 7         |
+| `FP8ALT`   | binary8alt    | 8 bit  | 4         | 3         |
 
 The following global parameters associated with FP formats are set in `fpnew_pkg`:
 ```SystemVerilog
-localparam int unsigned NUM_FP_FORMATS = 5;
+localparam int unsigned NUM_FP_FORMATS = 6;
 localparam int unsigned FP_FORMAT_BITS = $clog2(NUM_FP_FORMATS);
 ```
 
@@ -230,7 +237,7 @@ typedef struct packed {
 ```
 The fields of this struct behave as follows:
 
-##### `Width` - Datapath Wdith
+##### `Width` - Datapath Width
 
 Specifies the width of the FPU datapath and of the input and output data ports (`operands_i`/`result_o`).
 It must be larger or equal to the width of the widest enabled FP and integer format.
@@ -278,7 +285,7 @@ Otherwise, synthesis tools can optimize away any logic associated with this form
 
 #### `Implementation` - Implementation Options
 
-The FPU is divided into four operation groups,  `ADDMUL`, `DIVSQRT`, `NONDOMP`, and `CONV` (see [Architecture: Top-Level](#top-level)).
+The FPU is divided into five operation groups,  `ADDMUL`, `DIVSQRT`, `NONDOMP`, `CONV`, and `DOTP` (see [Architecture: Top-Level](#top-level)).
 The `Implementation` parameter controls the implementation of these operation groups.
 It is of type `fpu_implementation_t` which is defined as:
 ```SystemVerilog
@@ -320,17 +327,18 @@ The unit type `unit_type_t` is an enumeration of type `logic [1:0]` holding the
 The `UnitTypes` parameter allows to control resources used for the FPU by either removing operation units for certain formats and operations, or merging multiple formats into one.
 Currently, the follwoing unit types are available for the FPU operation groups:
 
-|            |      `ADDMUL`      |     `DIVSQRT`      |     `NONCOMP`      |       `CONV`       |
-|------------|--------------------|--------------------|--------------------|--------------------|
-| `PARALLEL` | :heavy_check_mark: |                    | :heavy_check_mark: |                    |
-| `MERGED`   | :heavy_check_mark: | :heavy_check_mark: |                    | :heavy_check_mark: |
+|            |      `ADDMUL`      |     `DIVSQRT`      |     `NONCOMP`      |       `CONV`       |       `DOTP`       |
+|------------|--------------------|--------------------|--------------------|--------------------|--------------------|
+| `PARALLEL` | :heavy_check_mark: |                    | :heavy_check_mark: |                    |                    |
+| `MERGED`   | :heavy_check_mark: | :heavy_check_mark: |                    | :heavy_check_mark: | :heavy_check_mark: |
 
 *Default*:
 ```SystemVerilog
 '{'{default: PARALLEL}, // ADDMUL
   '{default: MERGED},   // DIVSQRT
   '{default: PARALLEL}, // NONCOMP
-  '{default: MERGED}}   // CONV`
+  '{default: MERGED},   // CONV`
+  '{default: DISABLED}} // DOTP`
 ```
 (all formats within operation group use same type)
 
@@ -350,7 +358,33 @@ The configuration  `pipe_config_t` is an enumeration of type `logic [1:0]` holdi
 | `INSIDE`      | All registers are inserted at roughly the middle of the operational unit (if not possible, `BEFORE`) |
 | `DISTRIBUTED` | Registers are evenly distributed to `INSIDE`, `BEFORE`, and `AFTER` (if no `INSIDE`, all `BEFORE`)   |
 
+### `Stochastic Rounding Implementation`
 
+The `StochasticRndImplementation` parameter is used to configure the RSR support.
+It is of type `rsr_impl_t` which is defined as:
+```SystemVerilog
+typedef struct packed {
+  logic        EnableRSR;
+  int unsigned RsrPrecision;
+  int unsigned LfsrInternalPrecision;
+} rsr_impl_t;
+```
+The fields of this struct behave as follows:
+
+##### `EnableRSR` - Enable RSR support
+Enables stochastic rounding support in the `DOTP` operation group block. It instantiates an `LFSR` in the rounding module.
+
+*Default*: `1'b0`
+
+##### `RsrPrecision`
+Specifies the number of trailing bits considered for the stochastic rounding decision.
+
+*Default*: `12`
+
+##### `LfsrInternalPrecision`
+Specifies the LFSR internal bitwidth, thus controlling the pseudorandom number periodicity.
+
+*Default*: `32`
 
 ### Adding Custom Formats
 
@@ -391,14 +425,15 @@ The *operation group* is the highest level of grouping within FPnew and signifie
 
 ![FPnew](fig/top_block.png)
 
-There are currently four operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
+There are currently five operation groups in FPnew which are enumerated in `opgroup_e` as outlined in the following table:
 
 | Enumerator |                  Description                  |         Associated Operations         |
 |------------|-----------------------------------------------|---------------------------------------|
 | `ADDMUL`   | Addition and Multiplication                   | `FMADD`, `FNMSUB`, `ADD`, `MUL`       |
 | `DIVSQRT`  | Division and Square Root                      | `DIV`, `SQRT`                         |
 | `NONCOMP`  | Non-Computational Operations like Comparisons | `SGNJ`, `MINMAX`, `CMP`, `CLASS`      |
 | `CONV`     | Conversions                                   | `F2I`, `I2F`, `F2F`, `CPKAB`, `CPKCD` |
+| `DOTP`     | Dot Products                                  | `SDOTP`, `EXVSUM`, `VSUM`             |
 
 Most architectural decisions for FPnew are made at very fine granularity.
 The big exception to this is the generation of vectorial hardware which is decided at top level through the `EnableVectors` parameter.

diff --git a/src/fpnew_cast_multi.sv b/src/fpnew_cast_multi.sv
@@ -544,11 +544,17 @@ module fpnew_cast_multi #(
   assign pre_round_abs = dst_is_int_q ? ifmt_pre_round_abs[int_fmt_q2] : fmt_pre_round_abs[dst_fmt_q2];
 
   fpnew_rounding #(
-    .AbsWidth ( WIDTH )
+    .AbsWidth ( WIDTH ),
+    .EnableRSR ( 0 )
   ) i_fpnew_rounding (
+    .clk_i,
+    .rst_ni,
+    .id_i                    ( '0                ),
+    .en_rsr_i                ( 1'b0              ),
     .abs_value_i             ( pre_round_abs     ),
     .sign_i                  ( input_sign_q      ), // source format
     .round_sticky_bits_i     ( round_sticky_bits ),
+    .stochastic_rounding_bits_i ( '0             ),
     .rnd_mode_i              ( rnd_mode_q        ),
     .effective_subtraction_i ( 1'b0              ), // no operation happened
     .abs_rounded_o           ( rounded_abs       ),

diff --git a/src/fpnew_fma.sv b/src/fpnew_fma.sv
@@ -597,11 +597,17 @@ module fpnew_fma #(
 
   // Perform the rounding
   fpnew_rounding #(
-    .AbsWidth ( EXP_BITS + MAN_BITS )
+    .AbsWidth  ( EXP_BITS + MAN_BITS ),
+    .EnableRSR ( 0 )
   ) i_fpnew_rounding (
+    .clk_i,
+    .rst_ni,
+    .id_i                    ( '0                      ),
+    .en_rsr_i                ( 1'b0                    ),
     .abs_value_i             ( pre_round_abs           ),
     .sign_i                  ( pre_round_sign          ),
     .round_sticky_bits_i     ( round_sticky_bits       ),
+    .stochastic_rounding_bits_i ( '0                   ),
     .rnd_mode_i              ( rnd_mode_q              ),
     .effective_subtraction_i ( effective_subtraction_q ),
     .abs_rounded_o           ( rounded_abs             ),

diff --git a/src/fpnew_fma_multi.sv b/src/fpnew_fma_multi.sv
@@ -720,11 +720,17 @@ module fpnew_fma_multi #(
 
   // Perform the rounding
   fpnew_rounding #(
-    .AbsWidth ( SUPER_EXP_BITS + SUPER_MAN_BITS )
+    .AbsWidth  ( SUPER_EXP_BITS + SUPER_MAN_BITS ),
+    .EnableRSR ( 0 )
   ) i_fpnew_rounding (
+    .clk_i,
+    .rst_ni,
+    .id_i                    ( '0                      ),
+    .en_rsr_i                ( 1'b0                    ),
     .abs_value_i             ( pre_round_abs           ),
     .sign_i                  ( pre_round_sign          ),
     .round_sticky_bits_i     ( round_sticky_bits       ),
+    .stochastic_rounding_bits_i ( '0                   ),
     .rnd_mode_i              ( rnd_mode_q              ),
     .effective_subtraction_i ( effective_subtraction_q ),
     .abs_rounded_o           ( rounded_abs             ),

diff --git a/src/fpnew_opgroup_block.sv b/src/fpnew_opgroup_block.sv
@@ -26,6 +26,8 @@ module fpnew_opgroup_block #(
   parameter fpnew_pkg::pipe_config_t    PipeConfig    = fpnew_pkg::BEFORE,
   parameter type                        TagType       = logic,
   parameter int unsigned                TrueSIMDClass = 0,
+  parameter logic                       CompressedVecCmpResult = 0,
+  parameter fpnew_pkg::rsr_impl_t       StochasticRndImplementation = fpnew_pkg::DEFAULT_NO_RSR,
   // Do not change
   localparam int unsigned NUM_FORMATS  = fpnew_pkg::NUM_FP_FORMATS,
   localparam int unsigned NUM_OPERANDS = fpnew_pkg::num_operands(OpGroup),
@@ -34,6 +36,7 @@ module fpnew_opgroup_block #(
 ) (
   input logic                                     clk_i,
   input logic                                     rst_ni,
+  input logic [31:0]                              hart_id_i,
   // Input signals
   input logic [NUM_OPERANDS-1:0][Width-1:0]       operands_i,
   input logic [NUM_FORMATS-1:0][NUM_OPERANDS-1:0] is_boxed_i,
@@ -110,7 +113,8 @@ module fpnew_opgroup_block #(
         .NumPipeRegs   ( FmtPipeRegs[fmt]             ),
         .PipeConfig    ( PipeConfig                   ),
         .TagType       ( TagType                      ),
-        .TrueSIMDClass ( TrueSIMDClass                )
+        .TrueSIMDClass ( TrueSIMDClass                ),
+        .CompressedVecCmpResult ( CompressedVecCmpResult )
       ) i_fmt_slice (
         .clk_i,
         .rst_ni,
@@ -182,10 +186,12 @@ module fpnew_opgroup_block #(
       .PulpDivsqrt   ( PulpDivsqrt      ),
       .NumPipeRegs   ( REG              ),
       .PipeConfig    ( PipeConfig       ),
-      .TagType       ( TagType          )
+      .TagType       ( TagType          ),
+      .StochasticRndImplementation ( StochasticRndImplementation )
     ) i_multifmt_slice (
       .clk_i,
       .rst_ni,
+      .hart_id_i,
       .operands_i,
       .is_boxed_i,
       .rnd_mode_i,