Skip to content

[no_std] support? #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
LawrenceEsswood opened this issue Feb 21, 2024 · 4 comments
Closed

[no_std] support? #13

LawrenceEsswood opened this issue Feb 21, 2024 · 4 comments
Assignees

Comments

@LawrenceEsswood
Copy link

Seems to work fine without Box as both types are sized (which requires alloc):

fn transmute<A, B>(obj: A) -> B {
	use std::hint::black_box;

	// The layout of `DummyEnum` is approximately
	// DummyEnum {
	//     is_a_or_b: u8,
	//     data: usize,
	// }
	// Note that `data` is shared between `DummyEnum::A` and `DummyEnum::B`.
	// This should hopefully be more reliable than spamming the stack with a value and hoping the memory
	// is placed correctly by the compiler.
	enum DummyEnum<A, B> {
		A(Option<A>),
		B(Option<B>),
	}

	#[inline(never)]
	fn transmute_inner<A, B>(dummy: &mut DummyEnum<A, B>, obj: A) -> B {
		let DummyEnum::B(ref_to_b) = dummy else {
			unreachable!()
		};
		let ref_to_b = expand_mut(ref_to_b);
		*dummy = DummyEnum::A(Some(obj));
		black_box(dummy);

		ref_to_b.take().unwrap()
	}

	transmute_inner(black_box(&mut DummyEnum::B(None)), obj)
}
@Creative0708 Creative0708 self-assigned this Feb 21, 2024
@Creative0708
Copy link
Collaborator

The reason the types are wrapped in boxes is because of compiler optimizations that go with Option<T>. For example, bool and u8 have the same size, but Option<bool> is 1 byte and Option<u8> is 2 bytes. (From testing with compiler explorer, Option<bool> uses 2 to indicate None).

With an implementation without Box in DummyEnum, doing cve_rs::transmute::<bool, u8>(true) panics with called `Option::unwrap()` on a `None` value, while std::mem::transmute::<bool, u8>(true) gives 1. This is bad, because we don't want our memory-safe memory unsafeties to panic.

If there is a way to work around this, please let us know in another comment/issue/pull request! Closing for now as this implementation introduces bugs.

@Creative0708 Creative0708 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2024
@LawrenceEsswood
Copy link
Author

Ah, I see!

I think the correct solution is repr(C) as I think that guarantees the "tagged union" representation of an enum. This ensures your option is a tag followed by value, but then you then have to make sure it "follows" at the same offset.

The solution to that is to wrap again with a struct that contains a zero-length array of the other type so that the alignment of the two wrappers is the same:

https://godbolt.org/z/MKz93zjfK

Embedded systems could really do with memory-safe memory unsafety.

@Bright-Shard
Copy link
Collaborator

Bright-Shard commented Feb 24, 2024

That's genius! I was working on a new implementation of transmute that exploits a different known compiler bug, but couldn't get it working because of misaligned options. It now works and reliably outperforms the standard library for some reason! xD

Will hopefully PR this soon.

Benchmarks (from an m1 macbook air)

Benchmark names:

  • cve_rs transmute: My new transmute impl
  • cve_rs transmute_old: The current cve-rs implementation of transmute
  • cve_rs transmute_ref: A version of my new transmute implementation that works with references (and should be faster?)
  • std transmute: The standard library's core::mem::transmute function

Small transmutes on numbers:

  • cve_rs transmute: 425.03ps
  • cve_rs transmute_old: 15.736 ns
  • cve_rs transmute_ref: 423.70 ps
  • std transmute: 425.11 ps
Raw benchmark output
cve_rs transmute f32 -> i32
                        time:   [424.06 ps 425.03 ps 426.27 ps]
                        change: [+0.0111% +0.2321% +0.4599%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

Benchmarking cve_rs transmute_old f32 -> i32: Collecting 100 samples in estimated 5.0000 s (318
cve_rs transmute_old f32 -> i32
                        time:   [15.704 ns 15.736 ns 15.769 ns]
                        change: [+0.0261% +0.2573% +0.4933%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

Benchmarking cve_rs transmute_ref f32 -> i32: Collecting 100 samples in estimated 5.0000 s (12B
cve_rs transmute_ref f32 -> i32
                        time:   [423.44 ps 423.70 ps 423.98 ps]
                        change: [-0.1679% -0.0130% +0.1182%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking std transmute f32 -> i32: Collecting 100 samples in estimated 5.0000 s (12B iterat
std transmute f32 -> i32
                        time:   [424.19 ps 425.11 ps 426.25 ps]
                        change: [+0.1116% +0.2966% +0.4888%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

Larger transmutes on slices:

  • cve_rs transmute: 1.0478 µs
  • cve_rs transmute_old: 1.5562 µs
  • cve_rs transmute_ref: 1.3612 µs (Note: I dereference the slice in the benchmark, which may cause Rust to copy it, which could make this slower than it should be)
  • std transmute: 1.3622 µs
Raw benchmark output
Benchmarking cve_rs transmute [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimated 5.
cve_rs transmute [f64; 1024] -> [u8; 8192]
                        time:   [1.0464 µs 1.0478 µs 1.0502 µs]
                        change: [-0.1385% +0.0703% +0.3048%] (p = 0.56 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking cve_rs transmute_old [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimate
cve_rs transmute_old [f64; 1024] -> [u8; 8192]
                        time:   [1.5546 µs 1.5562 µs 1.5580 µs]
                        change: [+0.1917% +0.3941% +0.6279%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

Benchmarking cve_rs transmute_ref [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimate
cve_rs transmute_ref [f64; 1024] -> [u8; 8192]
                        time:   [1.3604 µs 1.3612 µs 1.3620 µs]
                        change: [-0.2034% -0.0568% +0.0963%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking std transmute [f64; 1024] -> [u8; 8192]: Collecting 100 samples in estimated 5.006
std transmute [f64; 1024] -> [u8; 8192]
                        time:   [1.3604 µs 1.3622 µs 1.3650 µs]
                        change: [-0.0725% +0.1093% +0.3168%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Note: These benchmarks don't include the changes from #26 - that might make the current implementation faster than these benches show as well.

@dekrain
Copy link

dekrain commented Mar 2, 2024

I wonder if it could also be achieved using a Cell, as UnsafeCell is excluded from niche optimizations, as it introduces a region of mutable sharable memory, which would interfere with the containing type. (MaybeUnit also prevents layout optimizations, as it is an union and may contain uninitialized data (as in the name), but getting the data out may not be possible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants