Skip to content

Commit dd4495f

Browse files
authored
Merge pull request #27 from bodil/safe-layout
Stop relying on undefined behaviour. `smartstring` now implements its own boxed string type rather than deferring directly to `String`, so it no longer makes assumptions it shouldn't be making about the layout of the `String` struct. This also allows us to organise the boxed struct in a way that will let us rely only on our basic assumption that heap memory is word aligned on both big and little endian architectures. The most immediate consequence of this is that `smartstring` will now compile on 32-bit big endian architectures such as `mips`. Closes #4 #8 #14
2 parents 22f9049 + 0b1aa69 commit dd4495f

File tree

17 files changed

+652
-633
lines changed

17 files changed

+652
-633
lines changed

.github/workflows/ci.yml

+24-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
rust:
1515
- stable
1616
- nightly
17-
- 1.56.0 # lowest supported version
17+
- 1.57.0 # lowest supported version
1818
flags:
1919
- --all-features
2020
- --no-default-features
@@ -38,7 +38,7 @@ jobs:
3838
rust:
3939
- stable
4040
- nightly
41-
- 1.56.0 # lowest supported version
41+
- 1.57.0 # lowest supported version
4242
steps:
4343
- uses: actions/checkout@v2
4444
- uses: actions-rs/toolchain@v1
@@ -51,6 +51,27 @@ jobs:
5151
command: test
5252
args: --all-features
5353

54+
nostd:
55+
name: no_std build
56+
runs-on: ubuntu-latest
57+
strategy:
58+
matrix:
59+
rust:
60+
- stable
61+
- nightly
62+
- 1.57.0 # lowest supported version
63+
steps:
64+
- uses: actions/checkout@v2
65+
- uses: actions-rs/toolchain@v1
66+
with:
67+
profile: minimal
68+
toolchain: ${{ matrix.rust }}
69+
override: true
70+
- uses: actions-rs/cargo@v1
71+
with:
72+
command: build
73+
args: --no-default-features
74+
5475
fmt:
5576
name: Rustfmt
5677
runs-on: ubuntu-latest
@@ -88,6 +109,7 @@ jobs:
88109
name: Clippy-${{ matrix.rust }}
89110
token: ${{ secrets.GITHUB_TOKEN }}
90111
args: --all-features
112+
91113
# miri:
92114
# name: Miri
93115
# runs-on: ubuntu-latest

CHANGELOG.md

+21
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project
66
adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### CHANGED
11+
12+
- `smartstring` now implements its own boxed string type rather than deferring directly to
13+
`String`, so it no longer makes assumptions it shouldn't be making about the layout of the
14+
`String` struct. This also allows us to organise the boxed struct in a way that will let us rely
15+
only on our basic assumption that heap memory is word aligned on both big and little endian
16+
architectures. The most immediate consequence of this is that `smartstring` will now compile on
17+
32-bit big endian architectures such as `mips`.
18+
19+
In short: `smartstring` no longer relies on undefined behaviour, and should be safe to use
20+
anywhere.
21+
22+
- The above means that the boxed `SmartString` is no longer pointer compatible with `String`, so
23+
if you were relying on that despite the documentation telling you not to, you'll really have to
24+
stop it now. Converting between `SmartString` and `String` using `From` and `Into` traits is
25+
still efficient and allocation free.
26+
27+
- The minimum supported rustc version is now 1.57.0.
28+
829
## [0.2.10] - 2022-02-20
930

1031
### CHANGED

Cargo.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ readme = "./README.md"
1111
categories = ["data-structures"]
1212
keywords = ["cache-local", "cpu-cache", "small-string", "sso", "inline-string"]
1313
exclude = ["release.toml", "proptest-regressions/**"]
14-
rust-version = "1.56"
14+
rust-version = "1.57"
1515

1616
[package.metadata.docs.rs]
1717
features = ["arbitrary", "proptest", "serde"]

README.md

+3-39
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# smartstring
22

3-
[![Travis CI](https://travis-ci.org/bodil/smartstring.svg?branch=master&status=passed)](https://travis-ci.org/github/bodil/smartstring)
4-
53
Compact inlined strings.
64

75
## tl;dr
@@ -18,42 +16,9 @@ beyond the inline capacity. This has the advantage of avoiding heap allocations
1816
well as improving performance thanks to keeping the strings on the stack.
1917

2018
This is all accomplished without the need for an external discriminant, so a `SmartString` is
21-
exactly the same size as a `String` on the stack, regardless of whether it's inlined or not, and
22-
when not inlined it's pointer compatible with `String`, meaning that you can safely coerce a
23-
`SmartString` to a `String` using `std::mem::replace()` or `pointer::cast()` and go on using it as
24-
if it had never been a `SmartString`. (But please don't do that, there's an `Into<String>`
25-
implementation that's much safer.)
26-
27-
## Supported architectures
28-
`smartstring` currently doesn't run on 32-bit big endian architectures like `powerpc`, so its use
29-
in any crates that intend to run on those architectures should ideally be gated behind a
30-
[platform specific dependency](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#platform-specific-dependencies)
31-
in your `Cargo.toml`, like so:
32-
```toml
33-
[target.'cfg(not(all(target_endian = "big", target_pointer_width = "32")))'.dependencies]
34-
smartstring = "0.2"
35-
```
36-
37-
This will ensure that `cargo` does not try to build `smartstring` on these unsupported
38-
architectures, which will otherwise [always fail](https://github.com/bodil/smartstring/blob/v0.2.9/src/config.rs#L91-L93).
39-
40-
## Caveat
41-
42-
The way `smartstring` gets by without a discriminant is dependent on the memory layout of the
43-
`std::string::String` struct, which isn't something the Rust compiler and standard library make any
44-
guarantees about. `smartstring` makes an assumption about how it's been laid out, which has held
45-
basically since rustc came into existence, but is nonetheless not a safe assumption to make, and if
46-
the layout ever changes, `smartstring` will stop working properly (at least on little-endian
47-
architectures, the assumptions made on big-endian archs will hold regardless of the actual memory
48-
layout). Its test suite does comprehensive validation of these assumptions, and as long as the
49-
[CI build](https://travis-ci.org/github/bodil/smartstring) is passing for any given rustc version,
50-
you can be sure it will do its job properly on all tested architectures. You can also check out the
51-
`smartstring` source tree yourself and run `cargo test` to validate it for your particular
52-
configuration.
53-
54-
As an extra precaution, some runtime checks are made as well, so that if the memory layout
55-
assumption no longer holds, `smartstring` will not work correctly, but there should be no security
56-
implications and it should crash early.
19+
exactly the same size as a `String` on the stack, regardless of whether it's inlined or not.
20+
Converting a heap allocated `SmartString` into a `String` and vice versa is also a zero cost
21+
operation, as one will reuse the allocated memory of the other.
5722

5823
## Documentation
5924

@@ -71,5 +36,4 @@ was not distributed with this file, You can obtain one at <http://mozilla.org/MP
7136
Please note that this project is released with a [Contributor Code of Conduct][coc]. By
7237
participating in this project you agree to abide by its terms.
7338

74-
[immutable.rs]: https://immutable.rs/
7539
[coc]: https://github.com/bodil/sized-chunks/blob/master/CODE_OF_CONDUCT.md

proptest-regressions/test.txt

+2
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
cc 371b459819b92f730318ad7fb8b15be10ac03393f9eb776d966a9af8bc489ae9 # shrinks to constructor = New, actions = [InsertStr(1, "")]
88
cc 5f343832f658239791b2754c8f1ec82e4caa9efb34bf1393636df2480ec9f176 # shrinks to constructor = New, actions = [InsertStr(1, "")]
99
cc b11c06f9d964d4fd4d4b6e36a7b04c383138422ff7f0f1d37d0c706de451d770 # shrinks to constructor = New, actions = [PushStr("{%:A¥%🕴🕴"), PushStr("%{{%¥{"), PushStr("?:%"), PushStr("🕴"), PushStr("%{?2"), Retain("?{¥:2🕴%")]
10+
cc 36b6f0fa95e8925cda11c176d3f606208e8085d3367c74c2a5f6df0538277b7a # shrinks to constructor = FromString("AΣA א \u{16af0}אff𑌓"), actions = [InsertStr(6, "")]
11+
cc 746a6d4c7bc53760e936eb5b7c332a9228f0a5209abd9538685e53c04d26ac71 # shrinks to constructor = New, actions = [PushStr("00𐲀Ὑ𞺋 🡐\u{abc}a0"), InsertStr(3, "")]

src/arbitrary.rs

+4
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
// This Source Code Form is subject to the terms of the Mozilla Public
2+
// License, v. 2.0. If a copy of the MPL was not distributed with this
3+
// file, You can obtain one at http://mozilla.org/MPL/2.0/.
4+
15
use crate::{SmartString, SmartStringMode};
26
use alloc::string::String;
37
use arbitrary::{Arbitrary, Result, Unstructured};

src/boxed.rs

+149-34
Original file line numberDiff line numberDiff line change
@@ -2,57 +2,172 @@
22
// License, v. 2.0. If a copy of the MPL was not distributed with this
33
// file, You can obtain one at http://mozilla.org/MPL/2.0/.
44

5-
use alloc::string::String;
6-
use core::cmp::Ordering;
5+
use alloc::{alloc::Layout, string::String};
6+
use core::{
7+
mem::forget,
8+
ops::{Deref, DerefMut},
9+
ptr::NonNull,
10+
};
711

8-
#[allow(unreachable_pub)]
9-
pub trait BoxedString {
10-
fn string(&self) -> &String;
11-
fn string_mut(&mut self) -> &mut String;
12-
fn into_string(self) -> String;
12+
use crate::{ops::GenericString, MAX_INLINE};
1313

14-
fn cmp_with_str(&self, other: &str) -> Ordering;
15-
fn cmp_with_self(&self, other: &Self) -> Ordering;
16-
fn eq_with_str(&self, other: &str) -> bool;
17-
fn eq_with_self(&self, other: &Self) -> bool;
14+
#[cfg(not(endian = "big"))]
15+
#[repr(C)]
16+
pub(crate) struct BoxedString {
17+
ptr: NonNull<u8>,
18+
cap: usize,
19+
len: usize,
20+
}
21+
22+
#[cfg(endian = "big")]
23+
#[repr(C)]
24+
pub(crate) struct BoxedString {
25+
length: usize,
26+
cap: usize,
27+
ptr: NunNull<u8>,
28+
}
1829

19-
fn len(&self) -> usize {
20-
self.string().len()
30+
impl GenericString for BoxedString {
31+
fn set_size(&mut self, size: usize) {
32+
self.len = size;
33+
debug_assert!(self.len <= self.cap);
34+
}
35+
36+
fn as_mut_capacity_slice(&mut self) -> &mut [u8] {
37+
#[allow(unsafe_code)]
38+
unsafe {
39+
core::slice::from_raw_parts_mut(self.ptr.as_ptr(), self.capacity())
40+
}
2141
}
2242
}
2343

24-
impl BoxedString for String {
25-
#[inline(always)]
26-
fn string(&self) -> &String {
27-
self
44+
impl BoxedString {
45+
const MINIMAL_CAPACITY: usize = MAX_INLINE * 2;
46+
47+
fn layout_for(cap: usize) -> Layout {
48+
let layout = Layout::array::<u8>(cap).unwrap();
49+
assert!(
50+
layout.size() <= isize::MAX as usize,
51+
"allocation too large!"
52+
);
53+
layout
54+
}
55+
56+
fn alloc(cap: usize) -> NonNull<u8> {
57+
let layout = Self::layout_for(cap);
58+
#[allow(unsafe_code)]
59+
let ptr = unsafe { alloc::alloc::alloc(layout) };
60+
match NonNull::new(ptr) {
61+
Some(ptr) => ptr,
62+
None => alloc::alloc::handle_alloc_error(layout),
63+
}
64+
}
65+
66+
fn realloc(&mut self, cap: usize) {
67+
let layout = Self::layout_for(cap);
68+
let old_layout = Self::layout_for(self.cap);
69+
let old_ptr = self.ptr.as_ptr();
70+
#[allow(unsafe_code)]
71+
let ptr = unsafe { alloc::alloc::realloc(old_ptr, old_layout, layout.size()) };
72+
self.ptr = match NonNull::new(ptr) {
73+
Some(ptr) => ptr,
74+
None => alloc::alloc::handle_alloc_error(layout),
75+
};
76+
self.cap = cap;
2877
}
2978

30-
#[inline(always)]
31-
fn string_mut(&mut self) -> &mut String {
32-
self
79+
pub(crate) fn ensure_capacity(&mut self, target_cap: usize) {
80+
let mut cap = self.cap;
81+
while cap < target_cap {
82+
cap *= 2;
83+
}
84+
self.realloc(cap)
3385
}
3486

35-
fn into_string(self) -> String {
36-
self
87+
pub(crate) fn new(cap: usize) -> Self {
88+
let cap = cap.max(Self::MINIMAL_CAPACITY);
89+
Self {
90+
cap,
91+
len: 0,
92+
ptr: Self::alloc(cap),
93+
}
3794
}
3895

39-
#[inline(always)]
40-
fn cmp_with_str(&self, other: &str) -> Ordering {
41-
self.as_str().cmp(other)
96+
pub(crate) fn from_str(cap: usize, src: &str) -> Self {
97+
let mut out = Self::new(cap);
98+
out.len = src.len();
99+
out.as_mut_capacity_slice()[..src.len()].copy_from_slice(src.as_bytes());
100+
out
42101
}
43102

44-
#[inline(always)]
45-
fn cmp_with_self(&self, other: &Self) -> Ordering {
46-
self.cmp(other)
103+
pub(crate) fn capacity(&self) -> usize {
104+
self.cap
47105
}
48106

49-
#[inline(always)]
50-
fn eq_with_str(&self, other: &str) -> bool {
51-
self == other
107+
pub(crate) fn shrink_to_fit(&mut self) {
108+
self.realloc(self.len);
52109
}
110+
}
111+
112+
impl Drop for BoxedString {
113+
fn drop(&mut self) {
114+
#[allow(unsafe_code)]
115+
unsafe {
116+
alloc::alloc::dealloc(self.ptr.as_ptr(), Self::layout_for(self.cap))
117+
}
118+
}
119+
}
120+
121+
impl Clone for BoxedString {
122+
fn clone(&self) -> Self {
123+
Self::from_str(self.capacity(), self.deref())
124+
}
125+
}
126+
127+
impl Deref for BoxedString {
128+
type Target = str;
129+
130+
fn deref(&self) -> &Self::Target {
131+
#[allow(unsafe_code)]
132+
unsafe {
133+
core::str::from_utf8_unchecked(core::slice::from_raw_parts(self.ptr.as_ptr(), self.len))
134+
}
135+
}
136+
}
137+
138+
impl DerefMut for BoxedString {
139+
fn deref_mut(&mut self) -> &mut Self::Target {
140+
#[allow(unsafe_code)]
141+
unsafe {
142+
core::str::from_utf8_unchecked_mut(core::slice::from_raw_parts_mut(
143+
self.ptr.as_ptr(),
144+
self.len,
145+
))
146+
}
147+
}
148+
}
149+
150+
impl From<String> for BoxedString {
151+
fn from(mut s: String) -> Self {
152+
if s.is_empty() {
153+
Self::new(s.capacity())
154+
} else {
155+
// TODO: Use String::into_raw_parts when stabilised, meanwhile let's get unsafe
156+
let len = s.len();
157+
let cap = s.capacity();
158+
#[allow(unsafe_code)]
159+
let ptr = unsafe { NonNull::new_unchecked(s.as_mut_ptr()) };
160+
forget(s);
161+
Self { cap, len, ptr }
162+
}
163+
}
164+
}
53165

54-
#[inline(always)]
55-
fn eq_with_self(&self, other: &Self) -> bool {
56-
self == other
166+
impl From<BoxedString> for String {
167+
fn from(s: BoxedString) -> Self {
168+
#[allow(unsafe_code)]
169+
let out = unsafe { String::from_raw_parts(s.ptr.as_ptr(), s.len(), s.capacity()) };
170+
forget(s);
171+
out
57172
}
58173
}

src/casts.rs

+7-7
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,19 @@
22
// License, v. 2.0. If a copy of the MPL was not distributed with this
33
// file, You can obtain one at http://mozilla.org/MPL/2.0/.
44

5-
use crate::{inline::InlineString, SmartStringMode};
5+
use crate::{boxed::BoxedString, inline::InlineString};
66

7-
pub(crate) enum StringCast<'a, Mode: SmartStringMode> {
8-
Boxed(&'a Mode::BoxedString),
7+
pub(crate) enum StringCast<'a> {
8+
Boxed(&'a BoxedString),
99
Inline(&'a InlineString),
1010
}
1111

12-
pub(crate) enum StringCastMut<'a, Mode: SmartStringMode> {
13-
Boxed(&'a mut Mode::BoxedString),
12+
pub(crate) enum StringCastMut<'a> {
13+
Boxed(&'a mut BoxedString),
1414
Inline(&'a mut InlineString),
1515
}
1616

17-
pub(crate) enum StringCastInto<Mode: SmartStringMode> {
18-
Boxed(Mode::BoxedString),
17+
pub(crate) enum StringCastInto {
18+
Boxed(BoxedString),
1919
Inline(InlineString),
2020
}

0 commit comments

Comments
 (0)