[no_std] support? #13

LawrenceEsswood · 2024-02-21T00:34:17Z

Seems to work fine without Box as both types are sized (which requires alloc):

fn transmute<A, B>(obj: A) -> B {
	use std::hint::black_box;

	// The layout of `DummyEnum` is approximately
	// DummyEnum {
	//     is_a_or_b: u8,
	//     data: usize,
	// }
	// Note that `data` is shared between `DummyEnum::A` and `DummyEnum::B`.
	// This should hopefully be more reliable than spamming the stack with a value and hoping the memory
	// is placed correctly by the compiler.
	enum DummyEnum<A, B> {
		A(Option<A>),
		B(Option<B>),
	}

	#[inline(never)]
	fn transmute_inner<A, B>(dummy: &mut DummyEnum<A, B>, obj: A) -> B {
		let DummyEnum::B(ref_to_b) = dummy else {
			unreachable!()
		};
		let ref_to_b = expand_mut(ref_to_b);
		*dummy = DummyEnum::A(Some(obj));
		black_box(dummy);

		ref_to_b.take().unwrap()
	}

	transmute_inner(black_box(&mut DummyEnum::B(None)), obj)
}

The text was updated successfully, but these errors were encountered:

Creative0708 · 2024-02-21T20:45:47Z

The reason the types are wrapped in boxes is because of compiler optimizations that go with Option<T>. For example, bool and u8 have the same size, but Option<bool> is 1 byte and Option<u8> is 2 bytes. (From testing with compiler explorer, Option<bool> uses 2 to indicate None).

With an implementation without Box in DummyEnum, doing cve_rs::transmute::<bool, u8>(true) panics with called `Option::unwrap()` on a `None` value, while std::mem::transmute::<bool, u8>(true) gives 1. This is bad, because we don't want our memory-safe memory unsafeties to panic.

If there is a way to work around this, please let us know in another comment/issue/pull request! Closing for now as this implementation introduces bugs.

LawrenceEsswood · 2024-02-21T22:32:05Z

Ah, I see!

I think the correct solution is repr(C) as I think that guarantees the "tagged union" representation of an enum. This ensures your option is a tag followed by value, but then you then have to make sure it "follows" at the same offset.

The solution to that is to wrap again with a struct that contains a zero-length array of the other type so that the alignment of the two wrappers is the same:

https://godbolt.org/z/MKz93zjfK

Embedded systems could really do with memory-safe memory unsafety.

Bright-Shard · 2024-02-24T15:07:55Z

That's genius! I was working on a new implementation of transmute that exploits a different known compiler bug, but couldn't get it working because of misaligned options. It now works and reliably outperforms the standard library for some reason! xD

Will hopefully PR this soon.

Benchmarks (from an m1 macbook air)

Benchmark names:

cve_rs transmute: My new transmute impl
cve_rs transmute_old: The current cve-rs implementation of transmute
cve_rs transmute_ref: A version of my new transmute implementation that works with references (and should be faster?)
std transmute: The standard library's core::mem::transmute function

Small transmutes on numbers:

cve_rs transmute: 425.03ps
cve_rs transmute_old: 15.736 ns
cve_rs transmute_ref: 423.70 ps
std transmute: 425.11 ps

Raw benchmark output

cve_rs transmute f32 -> i32
                        time:   [424.06 ps 425.03 ps 426.27 ps]
                        change: [+0.0111% +0.2321% +0.4599%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

Benchmarking cve_rs transmute_old f32 -> i32: Collecting 100 samples in estimated 5.0000 s (318
cve_rs transmute_old f32 -> i32
                        time:   [15.704 ns 15.736 ns 15.769 ns]
                        change: [+0.0261% +0.2573% +0.4933%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking cve_rs transmute_ref f32 -> i32: Collecting 100 samples in estimated 5.0000 s (12B
cve_rs transmute_ref f32 -> i32
                        time:   [423.44 ps 423.70 ps 423.98 ps]
                        change: [-0.1679% -0.0130% +0.1182%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking std transmute f32 -> i32: Collecting 100 samples in estimated 5.0000 s (12B iterat
std transmute f32 -> i32
                        time:   [424.19 ps 425.11 ps 426.25 ps]
                        change: [+0.1116% +0.2966% +0.4888%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Larger transmutes on slices:

cve_rs transmute: 1.0478 µs
cve_rs transmute_old: 1.5562 µs
cve_rs transmute_ref: 1.3612 µs (Note: I dereference the slice in the benchmark, which may cause Rust to copy it, which could make this slower than it should be)
std transmute: 1.3622 µs

Raw benchmark output

Benchmarking cve_rs transmute [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimated 5.
cve_rs transmute [f64; 1024] -> [u8; 8192]
                        time:   [1.0464 µs 1.0478 µs 1.0502 µs]
                        change: [-0.1385% +0.0703% +0.3048%] (p = 0.56 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking cve_rs transmute_old [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimate
cve_rs transmute_old [f64; 1024] -> [u8; 8192]
                        time:   [1.5546 µs 1.5562 µs 1.5580 µs]
                        change: [+0.1917% +0.3941% +0.6279%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

Benchmarking cve_rs transmute_ref [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimate
cve_rs transmute_ref [f64; 1024] -> [u8; 8192]
                        time:   [1.3604 µs 1.3612 µs 1.3620 µs]
                        change: [-0.2034% -0.0568% +0.0963%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking std transmute [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimated 5.006
std transmute [f64; 1024] -> [u8; 8192]
                        time:   [1.3604 µs 1.3622 µs 1.3650 µs]
                        change: [-0.0725% +0.1093% +0.3168%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Note: These benchmarks don't include the changes from #26 - that might make the current implementation faster than these benches show as well.

dekrain · 2024-03-02T22:29:57Z

I wonder if it could also be achieved using a Cell, as UnsafeCell is excluded from niche optimizations, as it introduces a region of mutable sharable memory, which would interfere with the containing type. (MaybeUnit also prevents layout optimizations, as it is an union and may contain uninitialized data (as in the name), but getting the data out may not be possible).

Creative0708 self-assigned this Feb 21, 2024

Creative0708 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[no_std] support? #13

[no_std] support? #13

LawrenceEsswood commented Feb 21, 2024

Creative0708 commented Feb 21, 2024

Uh oh!

LawrenceEsswood commented Feb 21, 2024

Uh oh!

Bright-Shard commented Feb 24, 2024 •

edited

Loading

Uh oh!

dekrain commented Mar 2, 2024

Uh oh!

[no_std] support? #13

[no_std] support? #13

Comments

LawrenceEsswood commented Feb 21, 2024

Creative0708 commented Feb 21, 2024

Uh oh!

LawrenceEsswood commented Feb 21, 2024

Uh oh!

Bright-Shard commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dekrain commented Mar 2, 2024

Uh oh!

Bright-Shard commented Feb 24, 2024 •

edited

Loading