From 1c16c530ec2c2c12ae2693893236bd17791a718f Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 12:49:13 -0500 Subject: [PATCH 1/8] Adding bfloat16 chapter --- src/riscv-unprivileged.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/src/riscv-unprivileged.adoc b/src/riscv-unprivileged.adoc index 673f38047..c34b0c1c4 100644 --- a/src/riscv-unprivileged.adoc +++ b/src/riscv-unprivileged.adoc @@ -172,6 +172,7 @@ include::zawrs.adoc[] include::zacas.adoc[] include::rvwmo.adoc[] include::ztso-st-ext.adoc[] +include::bfloat16.adoc[] include::cmo.adoc[] include::f-st-ext.adoc[] include::d-st-ext.adoc[] From a19f27b1760b07e0e13bada720605765c8d5cc25 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 13:24:43 -0500 Subject: [PATCH 2/8] Adding bfloat16 chapter contents --- bfloat16.adoc | 723 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 723 insertions(+) create mode 100644 bfloat16.adoc diff --git a/bfloat16.adoc b/bfloat16.adoc new file mode 100644 index 000000000..715228415 --- /dev/null +++ b/bfloat16.adoc @@ -0,0 +1,723 @@ +[[BF16_introduction]] +=== Introduction + +When FP16 (officially called binary16) was first introduced by the IEEE-754 standard, +it was just an interchange format. It was intended as a space/bandwidth efficient +encoding that would be used to transfer information. This is in line with the Zfhmin +extension. + +However, there were some applications (notably graphics) that found that the smaller +precision and dynamic range was sufficient for their space. So, FP16 started to see +some widespread adoption as an arithmetic format. This is in line with +the Zfh extension. + +While it was not the intention of '754 to have FP16 be an arithmetic format, it is +supported by the standard. Even though the '754 committee recognized that FP16 was +gaining popularity, the committee decided to hold off on making it a basic format +in the 2019 release. This means that a '754 compliant implementation of binary +floating point, which needs to support at least one basic format, cannot support +only FP16 - it needs to support at least one of binary32, binary64, and binary128. + +Experts working in machine learning noticed that FP16 was a much more compact way of +storing operands and often provided sufficient precision for them. However, they also +found that intermediate values were much better when accumulated into a higher precision. +The final computations were then typically converted back into the more compact FP16 +encoding. This approach has become very common in machine learning +(ML) inference where the weights and +activations are stored in FP16 encodings. There was the added benefit that smaller +multiplication blocks could be created for the FP16's smaller number of significant bits. At this +point, widening multiply-accumulate instructions became much more common. Also, more +complicated dot product instructions started to show up including those that packed two +FP16 numbers in a 32-bit register, multiplied these by another pair of FP16 numbers in +another register, added these two products to an FP32 accumulate value in a 3rd register +and returned an FP32 result. + +Experts working in machine learning at Google who continued to work with FP32 values +noted that the least significant 16 bits of their mantissas were not always needed +for good results, even in training. They proposed a truncated version of FP32, which was +the 16 most significant bits of the FP32 encoding. This format was named BFloat16 +(or BF16). The B in BF16, stands for Brain since it was initially introduced +by the Google Brain team. Not only did they find that the number of +significant bits in BF16 tended to be sufficient for their work (despite being fewer than +in FP16), but it was very easy for them to reuse their existing data; FP32 numbers could +be readily rounded to BF16 with a minimal amount of work. Furthermore, the even smaller +number of the BF16 significant bits enabled even smaller +multiplication blocks to be built. Similar +to FP16, BF16 multiply-accumulate widening and dot-product instructions started to +proliferate. + +include::riscv-bfloat16-audience.adoc[] + +[[BF16_format]] +=== Number Format + +==== BF16 Operand Format + +BF16 bits:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: 'frac'}, +{bits: 8, name: 'expo'}, +{bits: 1, name: 'S'}, +]} +.... + +IEEE Compliance: While BF16 (also known as BFloat16) is not an IEEE-754 _standard_ format, it is a valid +floating-point format as defined by IEEE-754. +There are three parameters that specify a format: radix (b), number of digits in the significand (p), +and maximum exponent (emax). +For BF16 these values are: + +[%autowidth] +.BF16 parameters +[cols = "2,1"] +|=== +| Parameter | Value + +|radix (b)|2 +|significand (p)|8 +|emax|127 +|=== + +[%autowidth] +.Obligatory Floating Point Format Table +[cols = "1,1,1,1,1,1,1,1"] +|=== +|Format|Sign Bits|Expo Bits|fraction bits|padded 0s|encoding bits|expo max/bias|expo min + +|FP16 |1| 5|10| 0|16| 15| -14 +|BF16|1| 8| 7| 0|16| 127|-126 +|TF32 |1| 8|10|13|32| 127|-126 +|FP32 |1| 8|23| 0|32| 127|-126 +|FP64 |1|11|52| 0|64|1023|-1022 +|FP128 |1|15|112|0|128|16,383|-16,382 +|=== + +==== BF16 Behavior + +For these BF16 extensions, instruction behavior on BF16 operands is the same as for other floating-point +instructions in the RISC-V ISA. For easy reference, some of this behavior is repeated here. + +===== Subnormal Numbers: +Floating-point values that are too small to be represented as normal numbers, but can still be expressed +by the format's smallest exponent value with a "0" integer bit and at least one "1" bit +in the trailing fractional bits are called subnormal numbers. Basically, the idea is there is +a trade off of precision to support _gradual underflow_. + +All of the BF16 instructions in the extensions defined in this specification (i.e., Zfbfmin, Zvfbfmin +and Zvfbfwma) fully support subnormal numbers. That is, instructions are able to accept subnormal values as +inputs and they can produce subnormal results. + + +[NOTE] +==== +Future floating-point extensions, including those that operate on BF16 values, may chose not to support subnormal numbers. +The comments about supporting subnormal BF16 values are limited to those instructions defined in this specification. +==== + +===== Infinities: +Infinities are used to represent values that are too large to be represented by the target format. +These are usually produced as a result of overflows (depending on the rounding mode), but can also +be provided as inputs. Infinities have a sign associated with them: there are positive infinities and negative infinities. + +Infinities are important for keeping meaningless results from being operated upon. + +===== NaNs + +NaN stands for Not a Number. + +There are two types of NaNs: signalling (sNaN) and quiet (qNaN). No computational +instruction will ever produce an sNaN; These are only provided as input data. Operating on an sNaN will cause +an invalid operation exception. Operating on a Quiet NaN usually does not cause an exception. + +QNaNs are provided as the result of an operation when it cannot be represented +as a number or infinity. For example, performing the square root of -1 will result in a qNaN because +there is no real number that can represent the result. NaNs can also be used as inputs. + +NaNs include a sign bit, but the bit has no meaning. + +NaNs are important for keeping meaningless results from being operated upon. + +Except where otherwise explicitly stated, when the result of a floating-point operation is a qNaN, it +is the RISC-V canonical NaN. For BF16, the RISC-V canonical NaN corresponds to the pattern of _0x7fc0_ which +is the most significant 16 bits of the RISC-V single-precision canonical NaN. + +===== Scalar NaN Boxing + +RISC-V applies NaN boxing to scalar results and checks for NaN boxing when a floating-point operation +--- even a vector-scalar operation --- consumes a value from a scalar floating-point register. +If the value is properly NaN-boxed, its least significant bits are used as the operand, otherwise +it is treated as if it were the canonical QNaN. + +NaN boxing is nothing more than putting the smaller encoding in the least significant bits of a register +and setting all of the more significant bits to “1”. This matches the encoding of a qNaN (although +not the canonical NaN) in the larger precision. + +Nan-boxing never affects the value of the operand itself, it just changes the bits of the register that +are more significant than the operand's most significant bit. + + +===== Rounding Modes: + +As is the case with other floating-point instructions, +the BF16 instructions support all 5 RISC-V Floating-point rounding modes. +These modes can be specified in the `rm` field of scalar instructions +as well as in the `frm` CSR + +[%autowidth] +.RISC-V Floating Point Rounding Modes +[cols = "1,1,1"] +|=== +|Rounding Mode | Mnemonic | Meaning +|000 | RNE | Round to Nearest, ties to Even +|001 | RTZ | Round towards Zero +|010 | RDN | Round Down (towards −∞) +|011 | RUP | Round Up (towards +∞) +|100 | RMM | Round to Nearest, ties to Max Magnitude +|=== + +As with other scalar floating-point instructions, the rounding mode field +`rm` can also take on the +`DYN` encoding, which indicates that the instruction uses the rounding +mode specified in the `frm` CSR. + +[%autowidth] +.Additional encoding for the `rm` field of scalar instructions +[cols = "1,1,1"] +|=== +|Rounding Mode | Mnemonic | Meaning +|111 | DYN | select dynamic rounding mode +|=== + +In practice, the default IEEE rounding mode (round to nearest, ties to even) is generally used for arithmetic. + +===== Handling exceptions +RISC-V supports IEEE-defined default exception handling. BF16 is no exception. + +Default exception handling, as defined by IEEE, is a simple and effective approach to producing results +in exceptional cases. For the coder to be able to see what has happened, and take further action if needed, +BF16 instructions set floating-point exception flags the same way as all other floating-point instructions +in RISC-V. + +====== Underflow + +The IEEE-defined underflow exception requires that a result be inexact and tiny, where tininess can be +detected before or after rounding. In RISC-V, tininess is detected after rounding. + +It is important to note that the detection of tininess after rounding requires its own rounding +that is different from the final result rounding. This tininess detection requires rounding as if the +exponent were unbounded. +This means that the input to the rounder is always a normal number. +This is different from the final result rounding where the input to the rounder is a subnormal number when +the value is too small to be represented as a normal number in the target format. +The two different roundings can result in underflow being signalled for results that are rounded +back to the normal range. + +As is defined in '754, under default exception handling, underflow is only signalled when the result is tiny +and inexact. In such a case, both the underflow and inexact flags are raised. + + +[[BF16_extensions]] +=== Extensions + +The group of extensions introduced by the BF16 Instruction Set +Extensions is listed here. + +Detection of individual BF16 extensions uses the +unified software-based RISC-V discovery method. + +[NOTE] +==== +At the time of writing, these discovery mechanisms are still a work in +progress. +==== + +The BF16 extensions defined in this specification (i.e., `Zfbfmin`, +`Zvfbfmin`, and `Zvfbfwma`) depend on the single-precision floating-point extension +`F`. Furthermore, the vector BF16 extensions (i.e.,`Zvfbfmin`, and +`Zvfbfwma`) depend on the `"V"` Vector Extension for Application +Processors or the `Zve32f` Vector Extension for Embedded Processors. + +As stated later in this specification, +there exists a dependency between the newly defined extensions: +`Zvfbfwma` depends on `Zfbfmin` +and `Zvfbfmin`. + +This initial set of BF16 extensions provides very basic functionality +including scalar and vector conversion between BF16 and +single-precision values, and vector widening multiply-accumulate +instructions. + + +// include::riscv-bfloat16-zfbfmin.adoc[] +[[zfbfmin, Zfbfmin]] +==== `Zfbfmin` - Scalar BF16 Converts + +This extension provides the minimal set of instructions needed to enable scalar support +of the BF16 format. It enables BF16 as an interchange format as it provides conversion +between BF16 values and FP32 values. + +This extension requires the single-precision floating-point extension +`F`, and the `FLH`, `FSH`, `FMV.X.H`, and `FMV.H.X` instructions as +defined in the `Zfh` extension. + +[NOTE] +==== +While conversion instructions tend to include all supported formats, in these extensions we +only support conversion between BF16 and FP32 as we are targeting a special use case. +These extensions are intended to support the case where BF16 values are used as reduced +precision versions of FP32 values, where use of BF16 provides a two-fold advantage for +storage, bandwidth, and computation. In this use case, the BF16 values are typically +multiplied by each other and accumulated into FP32 sums. +These sums are typically converted to BF16 +and then used as subsequent inputs. The operations on the BF16 values can be performed +on the CPU or a loosely coupled coprocessor. + +Subsequent extensions might provide support for native BF16 arithmetic. Such extensions +could add additional conversion +instructions to allow all supported formats to be converted to and from BF16. +==== + +[NOTE] +==== +BF16 addition, subtraction, multiplication, division, and square-root operations can be +faithfully emulated by converting the BF16 operands to single-precision, performing the +operation using single-precision arithmetic, and then converting back to BF16. Performing +BF16 fused multiply-addition using this method can produce results that differ by 1-ulp +on some inputs for the RNE and RMM rounding modes. + + +Conversions between BF16 and formats larger than FP32 can be +emulated. +Exact widening conversions from BF16 can be synthesized by first +converting to FP32 and then converting from FP32 to the target +precision. +Conversions narrowing to BF16 can be synthesized by first +converting to FP32 through a series of halving steps and then +converting from FP32 to the target precision. +As with the fused multiply-addition instruction described above, +this method of converting values to BF16 can be off by 1-ulp +on some inputs for the RNE and RMM rounding modes. +==== + +[%autowidth] +[%header,cols="2,4"] +|=== +|Mnemonic +|Instruction +|FCVT.BF16.S | <> +|FCVT.S.BF16 | <> +|FLH | +|FSH | +|FMV.H.X | +|FMV.X.H | +|=== + +// include::riscv-bfloat16-zvfbfmin.adoc[] +[[zvfbfmin,Zvfbfmin]] +==== `Zvfbfmin` - Vector BF16 Converts + +This extension provides the minimal set of instructions needed to enable vector support of the BF16 +format. It enables BF16 as an interchange format as it provides conversion between BF16 values +and FP32 values. + +This extension requires either the +"V" extension or the `Zve32f` embedded vector extension. + +[NOTE] +==== +While conversion instructions tend to include all supported formats, in these extensions we +only support conversion between BF16 and FP32 as we are targeting a special use case. +These extensions are intended to support the case where BF16 values are used as reduced +precision versions of FP32 values, where use of BF16 provides a two-fold advantage for +storage, bandwidth, and computation. In this use case, the BF16 values are typically +multiplied by each other and accumulated into FP32 sums. +These sums are typically converted to BF16 +and then used as subsequent inputs. The operations on the BF16 values can be performed +on the CPU or a loosely coupled coprocessor. + +Subsequent extensions might provide support for native BF16 arithmetic. Such extensions +could add additional conversion +instructions to allow all supported formats to be converted to and from BF16. +==== + +[NOTE] +==== +BF16 addition, subtraction, multiplication, division, and square-root operations can be +faithfully emulated by converting the BF16 operands to single-precision, performing the +operation using single-precision arithmetic, and then converting back to BF16. Performing +BF16 fused multiply-addition using this method can produce results that differ by 1-ulp +on some inputs for the RNE and RMM rounding modes. + +Conversions between BF16 and formats larger than FP32 can be +faithfully emulated. +Exact widening conversions from BF16 can be synthesized by first +converting to FP32 and then converting from FP32 to the target +precision. Conversions narrowing to BF16 can be synthesized by first +converting to FP32 through a series of halving steps using +vector round-towards-odd narrowing conversion instructions +(_vfncvt.rod.f.f.w_). The final convert from FP32 to BF16 would use +the desired rounding mode. + +==== + +[%autowidth] +[%header,cols="^2,4"] +|=== +|Mnemonic +|Instruction +| vfncvtbf16.f.f.w | <> +| vfwcvtbf16.f.f.v | <> +|=== + +// include::riscv-bfloat16-zvfbfwma.adoc[] +[[zvfbfwma,Zvfbfwma]] +==== `Zvfbfwma` - Vector BF16 widening mul-add + +This extension provides +a vector widening BF16 mul-add instruction that accumulates into FP32. + +This extension requires the `Zvfbfmin` extension and the `Zfbfmin` extension. + +[%autowidth] +[%header,cols="2,4"] +|=== +|Mnemonic +|Instruction + +|VFWMACCBF16 | <> +|=== + + +[[BF16_insns, reftext="BF16 Instructions"]] +=== Instructions + +// include::insns/fcvt_BF16_S.adoc[] +// <<< +[[insns-fcvt.bf16.s, Convert FP32 to BF16]] + +==== fcvt.bf16.s + +Synopsis:: +Convert FP32 value to a BF16 value + +Mnemonic:: +fcvt.bf16.s rd, rs1 + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010011', attr: ['OP-FP']}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 'rm'}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: '01000', attr: ['bf16.s']}, +{bits: 2, name: '10', attr: ['h']}, +{bits: 5, name: '01000', attr: 'fcvt'}, +]} +.... + + +[NOTE] +==== +.Encoding +While the mnemonic of this instruction is consistent with that of the other RISC-V floating-point convert instructions, +a new encoding is used in bits 24:20. + +`BF16.S` and `H` are used to signify that the source is FP32 and the destination is BF16. +==== + + +Description:: +Narrowing convert FP32 value to a BF16 value. Round according to the RM field. + +This instruction is similar to other narrowing +floating-point-to-floating-point conversion instructions. + + +Exceptions: Overflow, Underflow, Inexact, Invalid + +Included in: <> + +// include::insns/fcvt_S_BF16.adoc[] +// <<< +[[insns-fcvt.s.bf16, Convert BF16 to FP32]] +==== fcvt.s.bf16 + +Synopsis:: +Convert BF16 value to an FP32 value + +Mnemonic:: +fcvt.s.bf16 rd, rs1 + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010011', attr: ['OP-FP']}, +{bits: 5, name: 'rd'}, +{bits: 3, name: 'rm'}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: '00110', attr: ['bf16']}, +{bits: 2, name: '00', attr: ['s']}, +{bits: 5, name: '01000', attr: 'fcvt'}, +]} +.... + +[NOTE] +==== +.Encoding +While the mnemonic of this instruction is consistent with that of the other RISC-V floating-point +convert instructions, a new encoding is +used in bits 24:20 to indicate that the source is BF16. +==== + + +Description:: +Converts a BF16 value to an FP32 value. The conversion is exact. + +This instruction is similar to other widening +floating-point-to-floating-point conversion instructions. + +[NOTE] +==== +If the input is normal or infinity, the BF16 encoded value is shifted +to the left by 16 places and the +least significant 16 bits are written with 0s. + +The result is NaN-boxed by writing the most significant `FLEN`-32 bits with 1s. +==== + + + +Exceptions: Invalid + +Included in: <> + + +// include::insns/vfncvtbf16_f_f_w.adoc[] +// <<< +[[insns-vfncvtbf16.f.f.w, Vector convert FP32 to BF16]] +==== vfncvtbf16.f.f.w + +Synopsis:: +Vector convert FP32 to BF16 + +Mnemonic:: +vfncvtbf16.f.f.w vd, vs2, vm + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010111', attr:['OP-V']}, +{bits: 5, name: 'vd'}, +{bits: 3, name: '001', attr:['OPFVV']}, +{bits: 5, name: '11101', attr:['vfncvtbf16']}, +{bits: 5, name: 'vs2'}, +{bits: 1, name: 'vm'}, +{bits: 6, name: '010010', attr:['VFUNARY0']}, +]} +.... + +Reserved Encodings:: +* `SEW` is any value other than 16 + +Arguments:: + +[%autowidth] +[%header,cols="4,2,2,2"] +|=== +|Register +|Direction +|EEW +|Definition + +| Vs2 | input | 32 | FP32 Source +| Vd | output | 16 | BF16 Result +|=== + + + +Description:: +Narrowing convert from FP32 to BF16. Round according to the _frm_ register. + +This instruction is similar to `vfncvt.f.f.w` which converts a +floating-point value in a 2*SEW-width format into an SEW-width format. +However, here the SEW-width format is limited to BF16. + +Exceptions: Overflow, Underflow, Inexact, Invalid + +Included in: <> + + +// include::insns/vfwcvtbf16_f_f_v.adoc[] +// <<< +[[insns-vfwcvtbf16.f.f.v, Vector convert BF16 to FP32]] +==== vfwcvtbf16.f.f.v + +Synopsis:: +Vector convert BF16 to FP32 + +Mnemonic:: +vfwcvtbf16.f.f.v vd, vs2, vm + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010111', attr:['OP-V']}, +{bits: 5, name: 'vd'}, +{bits: 3, name: '001', attr:['OPFVV']}, +{bits: 5, name: '01101', attr:['vfwcvtbf16']}, +{bits: 5, name: 'vs2'}, +{bits: 1, name: 'vm'}, +{bits: 6, name: '010010', attr:['VFUNARY0']}, +]} +.... + +Reserved Encodings:: +* `SEW` is any value other than 16 + +Arguments:: +[%autowidth] +[%header,cols="4,2,2,2"] +|=== +|Register +|Direction +|EEW +|Definition + +| Vs2 | input | 16 | BF16 Source +| Vd | output | 32 | FP32 Result +|=== + +Description:: +Widening convert from BF16 to FP32. The conversion is exact. + +This instruction is similar to `vfwcvt.f.f.v` which converts a +floating-point value in an SEW-width format into a 2*SEW-width format. +However, here the SEW-width format is limited to BF16. + +[NOTE] +==== +If the input is normal or infinity, the BF16 encoded value is shifted +to the left by 16 places and the +least significant 16 bits are written with 0s. +==== + +Exceptions: Invalid + +Included in: <> + + +// include::insns/vfwmaccbf16.adoc[] +// <<< +[#insns-vfwmaccbf16, reftext="Vector BF16 widening multiply-accumulate"] +==== vfwmaccbf16 + +Synopsis:: +Vector BF16 widening multiply-accumulate + +Mnemonic:: +vfwmaccbf16.vv vd, vs1, vs2, vm + +vfwmaccbf16.vf vd, rs1, vs2, vm + + +Encoding (Vector-Vector):: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010111', attr:['OP-V']}, +{bits: 5, name: 'vd'}, +{bits: 3, name: '001', attr:['OPFVV']}, +{bits: 5, name: 'vs1'}, +{bits: 5, name: 'vs2'}, +{bits: 1, name: 'vm'}, +{bits: 6, name: '111011', attr:['vfwmaccbf16']}, +]} +.... + +Encoding (Vector-Scalar):: +[wavedrom, , svg] +.... +{reg:[ +{bits: 7, name: '1010111', attr:['OP-V']}, +{bits: 5, name: 'vd'}, +{bits: 3, name: '101', attr:['OPFVF']}, +{bits: 5, name: 'rs1'}, +{bits: 5, name: 'vs2'}, +{bits: 1, name: 'vm'}, +{bits: 6, name: '111011', attr:['vfwmaccbf16']}, +]} +.... + +Reserved Encodings:: +* `SEW` is any value other than 16 + +Arguments:: +[%autowidth] +[%header,cols="4,2,2,2"] +|=== +|Register +|Direction +|EEW +|Definition + +| Vd | input | 32 | FP32 Accumulate +| Vs1/rs1 | input | 16 | BF16 Source +| Vs2 | input | 16 | BF16 Source +| Vd | output | 32 | FP32 Result +|=== + +Description:: + +This instruction performs a widening fused multiply-accumulate +operation, where each pair of BF16 values are multiplied and their +unrounded product is added to the corresponding FP32 accumulate value. +The sum is rounded according to the _frm_ register. + + +In the vector-vector version, the BF16 elements are read from `vs1` +and `vs2` and FP32 accumulate value is read from `vd`. The FP32 result +is written to the destination register `vd`. + +The vector-scalar version is similar, but instead of reading elements +from `vs1`, a scalar BF16 value is read from the FPU register `rs1`. + + +Exceptions: Overflow, Underflow, Inexact, Invalid + +Operation:: + +This `vfwmaccbf16.vv` instruction is equivalent to widening each of the BF16 inputs to +FP32 and then performing an FMACC as shown in the following +instruction sequence: + +[source,asm] +-- +vfwcvtbf16.f.f.v T1, vs1, vm +vfwcvtbf16.f.f.v T2, vs2, vm +vfmacc.vv vd, T1, T2, vm +-- + +Likewise, `vfwmaccbf16.vf` is equivalent to the following instruction sequence: + +[source,asm] +-- +fcvt.s.bf16 T1, rs1 +vfwcvtbf16.f.f.v T2, vs2, vm +vfmacc.vf vd, T1, T2, vm +-- + +Included in: <> + + +// include::../bibliography.adoc[ieee] +[bibliography] +=== Bibliography + +bibliography::[] +https://ieeexplore.ieee.org/document/8766229[754-2019 - IEEE Standard for Floating-Point Arithmetic] + +https://ieeexplore.ieee.org/document/4610935[754-2008 - IEEE Standard for Floating-Point Arithmetic] From ca25d6bec2f794aff675d012bc65359c8a8425d5 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 13:38:34 -0500 Subject: [PATCH 3/8] Fixed remaining include --- bfloat16.adoc | 48 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/bfloat16.adoc b/bfloat16.adoc index 715228415..a25fcf1b6 100644 --- a/bfloat16.adoc +++ b/bfloat16.adoc @@ -46,7 +46,53 @@ multiplication blocks to be built. Similar to FP16, BF16 multiply-accumulate widening and dot-product instructions started to proliferate. -include::riscv-bfloat16-audience.adoc[] +// include::riscv-bfloat16-audience.adoc[] +[[BF16_audience]] +=== Intended Audience +Floating-point arithmetic is a specialized subject, requiring people with many different +backgrounds to cooperate in its correct and efficient implementation. +Where possible, we have written this specification to be understandable by +all, though we recognize that the motivations and references to +algorithms or other specifications and standards may be unfamiliar to those +who are not domain experts. + +This specification anticipates being read and acted on by various people +with different backgrounds. +We have tried to capture these backgrounds +here, with a brief explanation of what we expect them to know, and how +it relates to the specification. +We hope this aids people's understanding of which aspects of the specification +are particularly relevant to them, and which they may (safely!) ignore or +pass to a colleague. + +Software developers:: +These are the people we expect to write code using the instructions +in this specification. +They should understand the motivations for the +instructions we include, and be familiar with most of the algorithms +and outside standards to which we refer. + +Computer architects:: +We expect architects to have some basic floating-point background. +Furthermore, we expect architects to be able to examine our instructions +for implementation issues, understand how the instructions will be used +in context, and advise on how they best to fit the functionality. + +Digital design engineers & micro-architects:: +These are the people who will implement the specification inside a +core. Floating-point expertise is assumed as not all of the corner +cases are pointed out in the specification. + +Verification engineers:: +Responsible for ensuring the correct implementation of the extension +in hardware. These people are expected to have some floating-point +expertise so that they can identify and generate the interesting corner +cases --- include exceptions --- that are common in floating-point +architectures and implementations. + + +These are by no means the only people concerned with the specification, +but they are the ones we considered most while writing it. [[BF16_format]] === Number Format From 8ff4309e40bd75c51e80f21d75d06615e103bf73 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 13:45:52 -0500 Subject: [PATCH 4/8] Fixed extraneous bibliography::[] --- bfloat16.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/bfloat16.adoc b/bfloat16.adoc index a25fcf1b6..75bd42619 100644 --- a/bfloat16.adoc +++ b/bfloat16.adoc @@ -764,6 +764,7 @@ Included in: <> [bibliography] === Bibliography -bibliography::[] +// bibliography::[] + https://ieeexplore.ieee.org/document/8766229[754-2019 - IEEE Standard for Floating-Point Arithmetic] + https://ieeexplore.ieee.org/document/4610935[754-2008 - IEEE Standard for Floating-Point Arithmetic] From 272c3884ba7d74f2cc31b7194261bb6fbd3484f2 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 13:54:10 -0500 Subject: [PATCH 5/8] Moved bfloat16.adoc to src --- bfloat16.adoc => src/bfloat16.adoc | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename bfloat16.adoc => src/bfloat16.adoc (100%) diff --git a/bfloat16.adoc b/src/bfloat16.adoc similarity index 100% rename from bfloat16.adoc rename to src/bfloat16.adoc From 4dc23d6229de1b811dd6b1afcc3b5004d3face0d Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 15:00:06 -0500 Subject: [PATCH 6/8] Added Chapter title to BF16 --- src/bfloat16.adoc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/bfloat16.adoc b/src/bfloat16.adoc index 75bd42619..9078fe551 100644 --- a/src/bfloat16.adoc +++ b/src/bfloat16.adoc @@ -1,3 +1,6 @@ +[[bf16]] +== "BF16" Extensions for for BFloat16-precision Floating-Point, Version 1.0 + [[BF16_introduction]] === Introduction From 010853055b352749ff11559528ed3c36874457d4 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 16:42:27 -0500 Subject: [PATCH 7/8] Added back new-pages after each instruction --- src/bfloat16.adoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/bfloat16.adoc b/src/bfloat16.adoc index 9078fe551..7374edbcb 100644 --- a/src/bfloat16.adoc +++ b/src/bfloat16.adoc @@ -266,7 +266,7 @@ back to the normal range. As is defined in '754, under default exception handling, underflow is only signalled when the result is tiny and inexact. In such a case, both the underflow and inexact flags are raised. - +<<< [[BF16_extensions]] === Extensions @@ -489,7 +489,7 @@ floating-point-to-floating-point conversion instructions. Exceptions: Overflow, Underflow, Inexact, Invalid Included in: <> - +<<< // include::insns/fcvt_S_BF16.adoc[] // <<< [[insns-fcvt.s.bf16, Convert BF16 to FP32]] @@ -544,7 +544,7 @@ The result is NaN-boxed by writing the most significant `FLEN`-32 bits with 1s. Exceptions: Invalid Included in: <> - +<<< // include::insns/vfncvtbf16_f_f_w.adoc[] // <<< @@ -600,7 +600,7 @@ However, here the SEW-width format is limited to BF16. Exceptions: Overflow, Underflow, Inexact, Invalid Included in: <> - +<<< // include::insns/vfwcvtbf16_f_f_v.adoc[] // <<< @@ -660,7 +660,7 @@ least significant 16 bits are written with 0s. Exceptions: Invalid Included in: <> - +<<< // include::insns/vfwmaccbf16.adoc[] // <<< From c128ce67174caf0f997bc9310018ae52bba240b9 Mon Sep 17 00:00:00 2001 From: kdockser Date: Tue, 23 Apr 2024 17:21:51 -0500 Subject: [PATCH 8/8] Added mandatory space before forced page break --- src/bfloat16.adoc | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/bfloat16.adoc b/src/bfloat16.adoc index 7374edbcb..ba3e8bc86 100644 --- a/src/bfloat16.adoc +++ b/src/bfloat16.adoc @@ -123,12 +123,12 @@ For BF16 these values are: [cols = "2,1"] |=== | Parameter | Value - |radix (b)|2 |significand (p)|8 |emax|127 |=== + [%autowidth] .Obligatory Floating Point Format Table [cols = "1,1,1,1,1,1,1,1"] @@ -267,6 +267,7 @@ As is defined in '754, under default exception handling, underflow is only signa and inexact. In such a case, both the underflow and inexact flags are raised. <<< + [[BF16_extensions]] === Extensions @@ -489,6 +490,7 @@ floating-point-to-floating-point conversion instructions. Exceptions: Overflow, Underflow, Inexact, Invalid Included in: <> + <<< // include::insns/fcvt_S_BF16.adoc[] // <<< @@ -544,6 +546,7 @@ The result is NaN-boxed by writing the most significant `FLEN`-32 bits with 1s. Exceptions: Invalid Included in: <> + <<< // include::insns/vfncvtbf16_f_f_w.adoc[] @@ -600,6 +603,7 @@ However, here the SEW-width format is limited to BF16. Exceptions: Overflow, Underflow, Inexact, Invalid Included in: <> + <<< // include::insns/vfwcvtbf16_f_f_v.adoc[] @@ -660,6 +664,7 @@ least significant 16 bits are written with 0s. Exceptions: Invalid Included in: <> + <<< // include::insns/vfwmaccbf16.adoc[]