Skip to content

RFC: Support char::encode_utf8 in const scenarios. #3696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
90 changes: 90 additions & 0 deletions text/3696-const-char-encode-utf8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
- Feature Name: `const_char_encode_utf8`
- Start Date: 2024-09-17
- RFC PR: [rust-lang/rfcs#3696](https://github.com/rust-lang/rfcs/pull/3696)
- Rust Issue: [rust-lang/rust#130512](https://github.com/rust-lang/rust/issues/130512)

# Summary
[summary]: #summary

`char::encode_utf8` should be marked const to allow for compile-time conversions.
Considering mutable references now being stable in const environments, this implementation would be trivial even without compiler magic.

# Motivation
[motivation]: #motivation

The `encode_utf8` method (in `char`) is currently **not** marked as "const" and is therefore rendered unusable in scenarios that require const-compatibility.

With the recent stabilisation of [`const_mut_refs`](https://github.com/rust-lang/rust/issues/57349/), implementing `encode_utf8` with the current signature is trivial and would (in practice) yield no incompatibilities with existing code.

I expect that implementing this RFC – despite its limited scope – will however prove useful in supporting compile-time string handling in the future.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Currently, the `encode_utf8` method has the following prototype:

```rust
pub fn encode_utf8(self, dst: &mut [u8]) -> &mut str;
```

This is to simply be marked as const:

```rust
pub const fn encode_utf8(self, dst: &mut [u8]) -> &mut str;
```

This is not a breaking change.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Other than just adding the `const` qualifier to the function prototype, the function body would have to be changed due to some constructs currently not being supported in constant expressions.

A working implementation can be found at [`bjoernager/rust:const-char-encode-utf8`](https://github.com/bjoernager/rust/tree/const-char-encode-utf8).
Required changes are in [`/library/core/src/char/methods.rs`](https://github.com/bjoernager/rust/blob/const-char-encode-utf8/library/core/src/char/methods.rs/).

Note that this implementation assumes [`const_slice_from_raw_parts_mut`](https://github.com/rust-lang/rust/issues/67456/).

# Drawbacks
[drawbacks]: #drawbacks

Implementing this RFC at the current moment could degenerate diagnostics as the `assert` call in the `encode_utf8_raw` function relies on formatters that are non-const.

The reference implementation resolves this by instead using a generic message, although this may not be desired:

```
encode_utf8: buffer does not have enough bytes to encode code point
```

This *could* be changed to have the number of bytes required hard-coded, but doing so may instead sacrifice code readability.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

If the initial diagnostics are deemed to be worth more than const-compatibility then an `encode_utf8_unchecked` method could be considered instead:

```rust
pub const unsafe fn encode_utf8_unchecked(self, dst: &mut [u8]) -> &mut str;

// ... or...

pub const unsafe fn encode_utf8_unchecked(self, dst: *mut u8) -> *mut str;
```

This function would perform the same operation but without testing the length of `dst`, allowing for const conversions at least in the short-term (until formatters are stabilised).

# Prior art
[prior-art]: #prior-art

Currently none that I know of.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

The problem with diagnostic degeneration could be solved by allowing the used formatters in const environments.
I do not know if there already exists such a feature for use by the standard library.

# Future possibilities
[future-possibilities]: #future-possibilities

I suspect that having a similar `decode_utf8` method may be desired.