Skip to content

New learning path: Understanding Libamath's vector accuracy modes #2020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Understanding Libamath's vector accuracy modes

minutes_to_complete: 20
author: Joana Cruz

who_is_this_for: This is an introductory topic for software developers who want to learn how to use the different accuracy modes present in Libamath, a component of ArmPL. This feature was introduced in ArmPL 25.04.

learning_objectives:
- understand how accuracy is defined in Libamath;
- pick an accuracy mode depending on your application.

# [libamath](https://developer.arm.com/documentation/101004/2504/, (component of [ArmPL (Arm Performance Libraries)](https://developer.arm.com/documentation/101004/2504/General-information/Arm-Performance-Libraries?lang=en)). Since libamath only provides vector functions on Linux, we assume you are working in a Linux environment where ArmPL is installed (meaning you completed [ArmPL's installation guide](https://learn.arm.com/install-guides/armpl/).)

prerequisites:
- An Arm computer running Linux
- Build and install [ArmPL](https://learn.arm.com/install-guides/armpl/)

### Tags
skilllevels: Introductory
subjects: Performance and Architecture
armips:
- Neoverse
tools_software_languages:
- ArmPL
- GCC
- Libamath
operatingsystems:
- Linux

further_reading:
- resource:
title: ArmPL Libamath Documentation
link: https://developer.arm.com/documentation/101004/2410/General-information/Arm-Performance-Libraries-math-functions
type: documentation
# - resource:
# title: PLACEHOLDER BLOG
# link: PLACEHOLDER BLOG LINK
# type: blog
- resource:
title: ArmPL Installation Guide
link: https://learn.arm.com/install-guides/armpl/
type: website



### FIXED, DO NOT MODIFY
# ================================================================================
weight: 1 # _index.md always has weight of 1 to order correctly
layout: "learningpathall" # All files under learning paths have this same wrapper
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Examples
weight: 6

### FIXED, DO NOT MODIFY
layout: learningpathall
---

# Example

Here is an example invoking all accuracy modes of the Neon single precision exp function (where `ulp_error.h` is the implementation of ULP error explained in [this section](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp-error/)):

```C { line_numbers = "true" }
#include <amath.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#include "ulp_error.h"

void check_accuracy(float32x4_t (__attribute__((aarch64_vector_pcs)) *vexp_fun)(float32x4_t), float arg, const char *label) {
float32x4_t varg = vdupq_n_f32(arg);
float32x4_t vres = vexp_fun(varg);
double want = exp((double)arg);
float got = vgetq_lane_f32(vres, 0);

printf(label, arg);
printf("\n got = %a\n", got);
printf(" (float)want = %a\n", (float)want);
printf(" want = %.12a\n", want);
printf(" ULP error = %.4f\n\n", ulp_error(got, want));
}

int main(void) {
// Inputs that trigger worst-case errors for each accuracy mode
printf("Libamath example:\n");
printf("-----------------------------------------------\n");
printf(" // Display worst-case ULP error in expf for each\n");
printf(" // accuracy mode, along with approximate (`got`) and exact results (`want`)\n\n");

check_accuracy (armpl_vexpq_f32_u10, 0x1.ab312p+4, "armpl_vexpq_f32_u10(%a) delivers error under 1.0 ULP");
check_accuracy (armpl_vexpq_f32, 0x1.8163ccp+5, "armpl_vexpq_f32(%a) delivers error under 3.5 ULP");
check_accuracy (armpl_vexpq_f32_umax, -0x1.5b7322p+6, "armpl_vexpq_f32_umax(%a) delivers result with half correct bits");

return 0;
}
```

You can compile the above program with:
```bash
gcc -O2 -o example example.c -lamath -lm
```

Running the example returns:
```bash
$ ./example
Libamath example:
-----------------------------------------------
// Display worst-case ULP error in expf for each
// accuracy mode, along with approximate (`got`) and exact results (`want`)

armpl_vexpq_f32_u10(0x1.ab312p+4) delivers error under 1.0 ULP
got = 0x1.6ee554p+38
(float)want = 0x1.6ee556p+38
want = 0x1.6ee555bb01d1p+38
ULP error = 0.8652

armpl_vexpq_f32(0x1.8163ccp+5) delivers error under 3.5 ULP
got = 0x1.6a09ep+69
(float)want = 0x1.6a09e4p+69
want = 0x1.6a09e3e3d585p+69
ULP error = 1.9450

armpl_vexpq_f32_umax(-0x1.5b7322p+6) delivers result with half correct bits
got = 0x1.9b56bep-126
(float)want = 0x1.9b491cp-126
want = 0x1.9b491b9376d3p-126
ULP error = 1745.2120
```

The inputs we use for each variant correspond to the worst case scenario known to date (ULP Error argmax).
This means that the ULP error should not be higher than the one we demonstrate here, meaning we stand below the thresholds we define for each accuracy.
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: Floating Point Representation
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

# Floating-Point Representation Basics

Floating Point numbers are a finite and discrete approximation of the real numbers, allowing us to implement and compute functions in the continuous domain with an adequate (but limited) resolution.

A Floating Point number is typically expressed as:

```
+/-d.dddd...d x B^e
```

where:
* B is the base;
* e is the exponent;
* d.dddd...d is the mantissa (or significand). It is p-bit word, where p represents the precision;
* +/- sign which is usually stored separately.

If the leading digit is non-zero then it is a normalized representation/normal number.

{{% notice Example 1 %}}
Fixing `B=2, p=24`

`0.1 = 1.10011001100110011001101 × 2^4` is a normalized representation of 0.1

`0.1 = 0.000110011001100110011001 × 2^0` is a non normalized representation of 0.1

{{% /notice %}}

Usually a Floating Point number has multiple non-normalized representations, but only 1 normalized representation (assuming leading digit is stricly smaller than base), when fixing a base and a precision.


## Building a Floating-Point Ruler

Given a base `B`, a precision `p`, a maximum exponent `emax` and a minimum exponent `emin`, we can create the set of all the normalized values in this system.

{{% notice Example 3 %}}
`B=2, p=3, emax=2, emin=-1`

| Significand | × 2⁻¹ | × 2⁰ | × 2¹ | × 2² |
|-------------|-------|------|------|------|
| 1.00 (1.0) | 0.5 | 1.0 | 2.0 | 4.0 |
| 1.01 (1.25) | 0.625 | 1.25 | 2.5 | 5.0 |
| 1.10 (1.5) | 0.75 | 1.5 | 3.0 | 6.0 |
| 1.11 (1.75) | 0.875 | 1.75 | 3.5 | 7.0 |


{{% /notice %}}

Note that, for any given integer n, numbers are evenly spaced between 2ⁿ and 2ⁿ⁺¹. But the gap between them (also called [ULP](/learning-paths/servers-and-cloud-computing/multi-accuracy-libamath/ulp/), which we explain in the more detail in the next section) grows as the exponent increases. So the spacing between floating point numbers gets larger as numbers get bigger.

### The Floating-Point bitwise representation
Since there are `B^p` possible mantissas, and `emax-emin+1` possible exponents, then we need `log2(B^p) + log2(emax-emin+1) + 1` (sign) bits to represent a given Floating Point number in a system.
In Example 3, we need 3+2+1=6 bits.

We can then define Floating Point's bitwise representation in our system to be:

```
b0 b1 b2 b3 b4 b5
```

where

```
b0 -> sign (S)
b1, b2 -> exponent (E)
b3, b4, b5 -> mantissa (M)
```

However, this is not enough. In this bitwise definition, the possible values of E are 0, 1, 2, 3.
But in the system we are trying to define, we are only interested in the the integer values in the range [-1, 2].

For this reason, E is called the biased exponent, and in order to retrieve the value it is trying to represent (i.e. the unbiased exponent) we need to add/subtract an offset to it (in this case we subtract 1):

```
x = (-1)^S x M x 2^(E-1)
```

# IEEE-754 Single Precision

Single precision (also called float) is a 32-bit format defined by the [IEEE-754 Floating Point Standard](https://ieeexplore.ieee.org/document/8766229)

In this standard the sign is represented using 1 bit, the exponent uses 8 bits and the mantissa uses 23 bits.

The value of a (normalized) Floating Point in IEEE-754 can be represented as:

```
x=(−1)^S x 1.M x 2^E−127
```

The exponent bias of 127 allows storage of exponents from -126 to +127. The leading digit is implicit - that is we have 24 bits of precision. In normalized numbers the leading digit is implicitly 1.

{{% notice Special Cases in IEEE-754 Single Precision %}}
Since we have 8 bits of storage, meaning E ranges between 0 and 2^8-1=255. However not all these 256 values are going to be used for normal numbers.

If the exponent E is:
* 0, then we are either in the presence of a denormalized number or a 0 (if M is 0 as well);
* 1 to 254 then we are in the normalized range;
* 255 then we are in the presence of Inf (if M==0), or Nan (if M!=0).

Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.

They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.

Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.

The interpretation of denormal Floating Point in IEEE-754 can be represented as:

```
x=(−1)^S x 0.M x 2^−126
```

<!-- ### Subnormal numbers

Subnormal numbers (also called denormal numbers) are special floating-point values defined by the IEEE-754 standard.
They allow the representation of numbers very close to zero, smaller than what is normally possible with the standard exponent range.
Subnormal numbers do not have the a leading 1 in their representation. They also assume exponent is 0.

x=(−1)^s x 0.M x 2^−126

-->

<!-- | Significand | 0.? × 2⁻¹ | 1.? × 2⁻¹ | 1.? × 2⁰ | 1.? × 2¹ | 1.? × 2² |
|-------------|-----------|-----------|----------|----------|----------|
| 00 (1.0) | 0 | 0.5 | 1.0 | 2.0 | 4.0 |
| 01 (1.25) | 0.125 | 0.625 | 1.25 | 2.5 | 5.0 |
| 10 (1.5) | 0.25 | 0.75 | 1.5 | 3.0 | 6.0 |
| 11 (1.75) | 0.375 | 0.875 | 1.75 | 3.5 | 7.0 | -->
{{% /notice %}}

If you're interested in diving deeper in this subject, [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) by David Goldberg is a good place to start.

Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
title: Accuracy Modes in Libamath
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---


# The 3 Accuracy Modes of Libamath

Libamath vector functions can come in various accuracy modes for the same mathematical function.
This means, some of our functions allow users and compilers to choose between:
- **High accuracy** (≤ 1 ULP)
- **Default accuracy** (≤ 3.5 ULP)
- **Low accuracy / max performance** (approx. ≤ 4096 ULP)


# How Accuracy Modes Are Encoded in Libamath

You can recognize the accuracy mode of a function by inspecting the **suffix** in its symbol:

- **`_u10`** → High accuracy
E.g., `armpl_vcosq_f32_u10`
Ensures results stay within **1 Unit in the Last Place (ULP)**.

- *(no suffix)* → Default accuracy
E.g., `armpl_vcosq_f32`
Keeps errors within **3.5 ULP** — a sweet spot for many workloads.

- **`_umax`** → Low accuracy
E.g., `armpl_vcosq_f32_umax`
Prioritizes speed, tolerating errors up to **4096 ULP**, or roughly **11 correct bits** in single-precision.


# Applications

Selecting an appropriate accuracy level helps avoid unnecessary compute cost while preserving output quality where it matters.


### High Accuracy (≤ 1 ULP)

Use when **numerical (almost) correctness** is a priority. These routines involve precise algorithms (e.g., high-degree polynomials, careful range reduction, FMA usage) and are ideal for:

- **Scientific computing**
e.g., simulations, finite element analysis
- **Signal processing pipelines** [1,2]
especially recursive filters or transform
- **Validation & reference implementations**

While slower, these functions provide **near-bitwise reproducibility** — critical in sensitive domains.


### Default Accuracy (≤ 3.5 ULP)

The default mode strikes a **practical balance** between performance and numerical fidelity. It’s optimized for:

- **General-purpose math libraries**
- **Analytics workloads** [3]
e.g., log/sqrt during feature extraction
- **Inference pipelines** [4]
especially on edge devices where latency matters

Also suitable for many **scientific workloads** that can tolerate modest error in exchange for **faster throughput**.


### Low Accuracy / Max Performance (≤ 4096 ULP)

This mode trades precision for speed — aggressively. It's designed for:

- **Games, graphics, and shaders** [5]
e.g., approximating sin/cos for animation curves
- **Monte Carlo simulations**
where statistical convergence outweighs per-sample accuracy [6]
- **Genetic algorithms, audio processing, and embedded DSP**

Avoid in control-flow-critical code or where **errors amplify**.


# Summary

| Accuracy Mode | Libamath example | Approx. Error | Performance | Typical Applications |
|---------------|------------------------|------------------|-------------|-----------------------------------------------------------|
| `_u10` | _ZGVnN4v_cosf_u10 | ≤1.0 ULP | Low | Scientific computing, backpropagation, validation |
| *(default)* | _ZGVnN4v_cosf | ≤3.5 ULP | Medium | General compute, analytics, inference |
| `_umax` | _ZGVnN4v_cosf_umax | ≤4096 ULP | High | Real-time graphics, DSP, approximations, simulations |



**Pro tip:** If your workload has mixed precision needs, you can *selectively call different accuracy modes* for different parts of your pipeline. Libamath lets you tailor precision where it matters — and boost performance where it doesn’t.


#### References
1. Higham, N. J. (2002). *Accuracy and Stability of Numerical Algorithms* (2nd ed.). SIAM.

2. Texas Instruments. Overflow Avoidance Techniques in Cascaded IIR Filter Implementations on the TMS320 DSPs. Application Report SPRA509, 1999.
https://www.ti.com/lit/pdf/spra509

3. Ma, S., & Huai, J. (2019). Approximate Computation for Big Data Analytics. arXiv:1901.00232.
https://arxiv.org/pdf/1901.00232

4. Gupta, S., Agrawal, A., Gopalakrishnan, K., & Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML), PMLR 37.
https://proceedings.mlr.press/v37/gupta15.html

5. Unity Technologies. *Precision Modes*. Unity Shader Graph Documentation.
[https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html](https://docs.unity3d.com/Packages/[email protected]/manual/Precision-Modes.html)

6. Croci, M., Gorman, G. J., & Giles, M. B. (2021). Rounding Error using Low Precision Approximate Random Variables. arXiv:2012.09739.
https://arxiv.org/abs/2012.09739

Loading