Description
Background
Currently the core
crate doesn't provide support for mathematical functions like sqrt
or sin
.
To do math in a #![no_std]
program one has the following options:
-
Link to a C implementation of libm, i.e.
libm.a
. This is cumbersome as the programmer needs to
obtain a compiled version of libm for their target, or compile libm themselves which implies a C
cross toolchain when the target system and the build system are not the same architecture / OS. -
Use a pure Rust implementation of libm, like the
libm
crate. On stable, (a) the performance of
such implementation won't be on par with a C implementation, or (b) to achieve the same
performance the user would require a C (cross) toolchain.
To elaborate on (a) and (b). Consider the following contrived program that computes the square root
of a number:
#![no_std]
extern crate libm;
use core::ptr;
use libm::F32Ext;
#[no_mangle]
pub unsafe fn foo() {
// volatile memory accesses to prevent the compiler from optimizing away everything
let x: f32 = ptr::read_volatile(0x2000_0000 as *const _);
let y = x.sqrt();
ptr::write_volatile(0x2000_1000 as *mut _, y);
}
When compiled for the thumbv7em-none-eabihf
target it produces the following machine code:
00000000 <foo>:
0: f04f 5000 mov.w r0, #536870912 ; 0x20000000
4: ed90 0a00 vldr s0, [r0]
8: ee10 1a10 vmov r1, s0
c: f001 40ff and.w r0, r1, #2139095040 ; 0x7f800000
10: f1b0 4fff cmp.w r0, #2139095040 ; 0x7f800000
14: d108 bne.n 28 <foo+0x28>
16: ee00 0a00 vmla.f32 s0, s0, s0
(..)
2f4: ed80 0a00 vstr s0, [r0]
2f8: 4770 bx lr
This is extremely inefficient machine code because the target has a hardware FPU that supports
computing the square root in a single instruction. Ideally, the program should compile down to the
following machine code:
00000000 <foo>:
0: f04f 5000 mov.w r0, #536870912 ; 0x20000000
4: ed90 0a00 vldr s0, [r0]
8: f241 0000 movw r0, #4096 ; 0x1000
c: f2c2 0000 movt r0, #8192 ; 0x2000
10: eeb1 0ac0 vsqrt.f32 s0, s0
14: ed80 0a00 vstr s0, [r0]
18: 4770 bx lr
If the target had access to the standard library the program would compile down to that machine code
because the implementation of f32.sqrt
in std
looks like this:
#![feature(core_intrinsics)]
use std::intrinsics;
impl f32 {
fn sqrt(self) -> Self {
intrinsics::sqrtf32(self)
}
}
sqrtf32
is an unstable, thin wrapper around an LLVM intrinsic that either compiles down to a
hardware implementation of square root if the target architecture supports it in its instruction
set, or it produces a call to the sqrtf
routine if it doesn't (*). std
makes use of 30+ of such
LLVM intrinsics for performance of math functions.
(*) The llvm.sqrt.*
LLVM intrinsic, which sqrtf32
wraps, is not quite specified like that but
that's the observable effect.
The libm
crate can't make use of this intrinsic on stable because it's unstable and feature
gated. However, the libm
crate could replicate the behavior of the sqrtf32
intrinsic using
conditional compilation and external assembly files as shown below:
// crate: libm
// NOTE heavily simplified because it ignores architectures other than ARM
impl F32Ext for f32 {
#[cfg(target_arch = "arm")]
fn sqrt(self) -> Self {
extern "C" {
// provided by an external assembly file
fn vsqrt_f32(x: f32) -> f32;
}
unsafe { vsqrt_f32(self) }
}
#[cfg(not(target_arch = "arm"))]
fn sqrt(self) -> Self {
// software implementation
}
}
But this would heavily complicate the implementation of the libm
crate, which would likely
introduce bugs. Also, as it's not possible to use inline assembly (asm!
) on stable the vsqrt.f32
instruction would have to be invoked via FFI and an external assembly file. External assembly files
mean that the user would require a C (cross) toolchain to build the crate negating the main benefit
of using a pure Rust implementation of libm.
Possible solutions
I see two options for improving the situation here:
a. We stabilize the family of sqrtf32
LLVM intrinsics. This way crates like libm
can achieve the
performance of the std
implementation on stable without requiring complex conditional
compilation and C toolchains. Or,
b. We move all the existing math support from std
to core
. For the user this means that e.g.
f32.sqrt
will also work in #![no_std]
programs.
Option (a) is kind of bad (maybe?) for alternative backends like cranelift as they would have to
support / implement these LLVM intrinsics to be on parity with the rustc+LLVM
compiler.
Option (b) requires us (*) to provide an implementation of math functions (symbols) like sqrtf
for targets that do not link to libm by default. If we don't do this those targets will hit
"undefined reference to sqrtf
" linker errors when using math methods like f32.sqrt
.
(*) "us" as in: we must provide symbols like sqrtf
in the compiler-builtins
crate. Note that we
are already providing such symbols for the wasm32-unknown-unknown
target, and we are
using the libm
crate to do that.
If we go ahead with option (b) we must be careful to not provide the math symbols in
compiler-builtins
for targets that are currently using system libm (e.g.
x86_64-unknown-linux-gnu
). Because if we do provide the symbols then all existing programs will
start using the libm
crate implementation instead of the system libm implementation -- this is due
to how we invoke the linker: libcompiler_builtins.rlib
appears before -lm
in the linker
arguments -- and that may degrade performance in some cases where system libm has architecture
optimized implementations of some functions.
With option (b) I believe that #![no_std]
programs that are currently linking to some C
implementation of libm for math support will end up using the libm
crate implementation as a side
effect. I don't see a way to avoid this: even if we mark the math symbols in compiler-builtins
as
weak the way we invoke the linker will cause the program to use the libm
crate implementation.
Final thoughts
IMO, math support should be in the core
crate as it doesn't depend on OS, or I/O, abstractions
like other std
-only API does (e.g. std::fs
, std::net
). Also, std
makes math like sqrt
feel built-in because the functionality is provided as inherent methods -- it feels weird that such "built-in" functionality is not available in #![no_std]
.
Thoughts? Should we do (a) or (b)? Or is there some other solution? Or should we leave math out of core?
cc @SimonSapin (T-libs), @jethrogb @Ericson2314 (T-portability), @joshtriplett @korken89 (some stakeholders)