-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: Numeric literal types #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,318 @@ | ||
- Feature Name: numeric_literal_types | ||
- Start Date: 2018-07-30 | ||
- RFC PR: _ | ||
- Rust Issue: _ | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
This RFC introduces two new types: `ulit` and `flit`. These are the *numeric | ||
literal types*, i.e., the type of an integer literal `42` or a float literal | ||
`1.0`. These types exist to give a name to literals that do not have a fixed | ||
size. Consider the following error: | ||
``` | ||
error: int literal is too large | ||
--> src/main.rs:2:32 | ||
| | ||
2 | const VEGETA_CANT_EVEN: u128 = 9_000_000_000_000_000_000_000_000_000_000_000_000_001; | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
``` | ||
This expression could instead be given the type `ulit`, which could later | ||
be narrowed into a "real" integer, or fed into a `const` constructor, | ||
like `BigInt::new()`, without having to restrict itself to `u128`. These | ||
types, while unsized, are capable of being coerced into integers, | ||
following the current literal typing rules. | ||
|
||
Introducing language-level bignums is an *non-goal*. | ||
This RFC lays the groundwork for custom literals, but custom literals | ||
themselves are *also* a non-goal. | ||
|
||
Note: this proposal is given in full generality, with a series of weakened | ||
subsets that might be easier to implement or stabalize. The guide-level | ||
explanation is written only with this full generality in mind, since I don't | ||
think it's too difficult to explain the weakenings. Accepting this RFC will | ||
probably entail picking a weakening and applying it to both explanations. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
This proposal has a few motivating use cases: | ||
- Untyped compile-time constants, as in Go or C (via `#define`). | ||
- Bignum constructors. | ||
- Custom integer literals, à la `operator""`. | ||
|
||
The former is valuable, because it allows us to hoist several occurences | ||
of the same literal in different typed contexts, without having to type | ||
it as the largest possible numeric type and explicitly narrow, i.e. | ||
```rust | ||
let foo = my_u8() & 0b0101_0101; | ||
let bar = my_i32() & 0b0101_0101; | ||
// becomes | ||
const MY_MASK: ulit = 0b0101_0101; | ||
let foo = my_u8() & MY_MASK; | ||
let bar = my_i32() & MY_MASK; | ||
// instead of | ||
const MY_MASK: u128 = 0b0101_0101; | ||
let foo = my_u8() & (MY_MASK as u8); | ||
let bar = my_i32() & (MY_MASK as i32); | ||
``` | ||
This can be emulated by a macro that expands to the given literal, but | ||
that is unergonomic, and calling `MY_MASK!()` does not make it clear | ||
that this is a compile-time constant (`ALL_CAPS` not withstanding). | ||
|
||
The latter two are essentially the same proposal: access to arbitrary-precission | ||
integers for constructing bignums (and other custom literals). | ||
Custom literals need to take a number as input; while | ||
C++, the only language with custom literals, simply takes its versions of | ||
`u64` and `f64` as arguments for literals, this is an unnecessary restriction | ||
in Rust, given that we recently stabalized the `u128` type. This problem | ||
cannot be neatly worked around, as far as we know. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Consider the expression `42`. What's its type? The basic language introduction | ||
might lead you to believe that, like in C++ and Java, it's `i32`. In reality, | ||
the compiler assigns this expression the type `ulit`: the type of all *integer | ||
literals*. Note the `u`: this is because all integer literals are unsigned! | ||
The float equivalent is `flit`. | ||
|
||
`ulit` and `flit` are both DSTs, so they can't be passed to functons like | ||
normal values. Unlike other DSTs, however, if they are used in a `Sized` | ||
context, they will attempt to coerce into a sized integer type, defaulting | ||
to `i32` or `f64` if there isn't an obvious choice. This occurs silently, | ||
since one almost always wants a sized integer: | ||
```rust | ||
let x = 42; // 42 types as ulit, but since a let binding requires a | ||
// Sized value, it tries to coerce to a sized integer. since | ||
// there isn't an obvious one, it picks i32. Hence, x: i32. | ||
|
||
let y = 42u32; // 42u32 has type u32, so no coersion occurs. | ||
let z: u32 = 42; // ulit coerces to u32, since it's the required type | ||
``` | ||
|
||
Literal types are otherwise *mostly* like normal integers. They support | ||
arithmetic and comparisons (but don't implement any `std::ops` traits, since | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this include arithmetic for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes; I guess that's not clear. I'll update the RFC noting that later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? Where's the motivation for arbitrary-precision "float" arithmetic? Also: how? Completely accurate arithmetic beyond integers is a difficult area without any local minima that are clearly good enough to bake into the language. Usually people envision bignum rationals for this sort of thing, but besides not being able to handle anything beyond add/sub/mul/div, it has serious downsides: even if you always use irreducible fractions and eat the significant overhead of doing so, the space requirements are pretty bad for many numbers (and truly awful for some). |
||
they're not `Sized`). Like any DST, they can be passed around behind references. | ||
You can even write | ||
```rust | ||
const REALLY_BIG: &ulit = &1000000000000000000000; | ||
// analogous to | ||
const HELLO_WORLD: &str = "Hello, world!"; | ||
``` | ||
You can then use `REALLY_BIG` anywhere you'd use the literal, instead. The | ||
reference types `&'static ulit` and `&'static flit` will automatically coerce | ||
into any numeric type, via dereferencing. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
This RFC introduces two new DSTs: `ulit` and `flit`. Note that we do not | ||
introduce `ilit`; this is because it is not possible to write down a literal | ||
for a negative number, since `-1` is *really* `1.neg()`. These types behave | ||
like most DSTs, with a few exceptions: | ||
|
||
If either type is used in a context that requires a `Sized` type | ||
(a function call, a let binding, a generic paramter, etc), they will | ||
coerce according to the current typing rules for literals: whatever | ||
is infered as correct, or `i32`/`f64` as a fallback. Note that `as` casts | ||
do what is expected. See [unresolved-questions] for alternative | ||
ways we could perform the coersions. | ||
|
||
For ergonomic reasons, | ||
static references to either type are dereferenced automatically in `Sized` | ||
context. This is to support the following pattern: | ||
```rust | ||
const FOO: &ulit = 0b0101_0101; | ||
|
||
let x: u8 = 5; | ||
let y = 5 & FOO; // here `FOO` is coerced from `&ulit` to `u8` | ||
``` | ||
|
||
The representation of `ulit` and `flit` is unspecified, but this RFC suggests | ||
representations. Note that the compiler is *not* required to use these; | ||
they are merely a suggestion for what a good representation would be. | ||
```rust | ||
// represented as an array of bytes in the target endianness; | ||
// this endianness choice means a coersion is just a memcpy | ||
struct ulit([u8]); | ||
|
||
// represented as a ratio of target-endian, arbitrary-length | ||
// integers. this is chosen over an IEEE-like base-2 notation, | ||
// which would require rounding. unfortunately, this requires | ||
// an fdiv for coersion, though this is not a problem, since | ||
// it is unlikely that an flit will need to be coerced at runtime | ||
struct flit { | ||
middle: usize, | ||
bytes: [u8] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please update the fields to reflect that In the ratio representation, how to ensure the compiler won't be DoS'ed with the following (it prints fn main() {
let m = 1.0e+999_999_999_999_999_999_999_999_999;
println!("{:?}", m);
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This representation doesn't work either, at least not the "fdiv to convert" bit, since to do that you first need to convert numerator and denominator to the float format. The "ratio of bignums" aspect is at least lossless and can therefore be salvaged by a more expensive conversion algorithms, but it's not more useful than other representations for those algorithms. Frankly, I don't think that there's anything better than a plain old string for representing literals. In any case, the conversion will be rather expensive when done at runtime. In the worst case it requires at least one >1000 bit integer division+remainder calculations, to the best of my knowledge even multiple. Yet another reason to not permit such values to leak into runtime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kennytm I don't quite understand what you mean-- the Also... this seems like a problem, since neither a ratio or IEEE representation seem better... the former makes scientific notation awful, and the latter makes most decimals require truncation. I think @rkruppe has a point- I don't intend these values to ever reach a runtime context, so in practice a string representation with a very expensive conversion is fine. I'll update the RFC later with a list of possible representations and their drawbacks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can the standard float parsing facilities handle arbitrary-size floats? I assume you're referring to As I mentioned below, I think an opaque type (which could even just be a lang item!) might be a neater abstraction. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hold on. Let's be careful about what the language/compiler can actually provide to a user-defined type in this area. It can't reasonably do base conversion and rounding for them. For example, a "bignum rational" library type, a base 10 floating point, and a base 2 fixed point type all do very different things with the decimal digits in the source code. Library types will often have to do their own custom parsing, period. Furthermore, even if there's code to be shared between many such libraries, it can simply be yet more library code. It doesn't have to be built into the compiler (just as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sure! Here's what I imagine if we go for the string-based alternative: // mod core::ops? core::num is private iirc
// the compiler needs to know about this type, since it needs to
// be able to construct it from literals in source code.
#[lang = "float_lit"]
// sticking with flit for now, though since it's not a primitive
// will definitely want to go with FloatLit... which I would be ok with
struct flit(str);
impl flit {
/// The literal, as it appears in source code.
/// (Canonicalized to remove underscores.)
///
/// Consider, e.g. `lit.verbatim().parse::<f64>().expect("...")`
const fn verbatim(&self) -> &str { .. }
// the following are all *very* expensive str-manipulation fns
const fn numer_bytes(&self) -> &[u8] { .. }
const fn denom_bytes(&self) -> &[u8] { .. }
const fn mantissa_bytes(&self, base: usize) -> &[u8] { .. }
const fn exponent_bytes(&self, base: usize) -> &[u8] { .. }
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who would benefit from As for What you can write is a parsing function for IEEE 754 floating point parametrized by the base, precision, and exponent range (that's the way the standard is written, even!) but that function is even more niche than the rational-based functions discussed before. And yet again my most important contention remains unaddressed: what justifies putting these conversion functions into the standard library rather than letting those who need that conversion either do it themselves or import a third party library for it? |
||
} | ||
``` | ||
`ulit` and `flit` do not implement any `ops` arithmetic traits, since | ||
those require `Self: Sized`. They do, however, support all the usual arithmetic | ||
operations primitively. Since these types are *not* meant to be used at runtime | ||
as bignums, the compiler is encouraged to implement these naively, and | ||
to warn when the constant time expression evaluator can't fold them away. | ||
|
||
Again, we emphasize that the representation of these types is unspecified, and | ||
the above is only a discussion of *possible* layouts. | ||
|
||
Furthermore, they support the following, self-explanatory interfaces: | ||
```rust | ||
impl ulit { | ||
fn lit_bytes(&self) -> &[u8]; | ||
} | ||
|
||
impl flit { | ||
fn lit_numer(&self) -> &[u8]; | ||
fn lit_denom(&self) -> &[u8]; | ||
} | ||
``` | ||
The documentation should point out that the endianness of the returned slices | ||
is platform-dependent. Alternatively, we could make it little-endian by default | ||
and add some mechanism to get it in the platform endianess. We may want to | ||
guarantee that, e.g., | ||
```rust | ||
42i32 == transmute_copy::<[u8], i32>(&42.lit_bytes()[0..4]) | ||
``` | ||
|
||
`ulit` and `flit` implement `PartialEq, Eq, PartialOrd, Ord`. Note that | ||
`flit` cannot take on the IEEE values `Infinity`, `-Infinity`, or `NaN`, so | ||
we can *actually* get away with this. | ||
|
||
`ulit` and `flit` are *never* infered as the value of type variables solely | ||
on the basis that they are the type of a literal. It is unclear if we | ||
should allow the last case here: | ||
```rust | ||
fn foo<T: ?Sized>(x: &'static T) -> Box<T> { .. } | ||
|
||
let _ = foo(&42); // T types as i32, not ulit! | ||
let _ = foo::<ulit>(&42); // T is explcitily types as ulit. this is OK! | ||
|
||
let _: Box<ulit> = foo(&42); // T types as ulit | ||
``` | ||
|
||
## Weakenings | ||
|
||
The following are ways in which we can weaken this proposal into a workable | ||
subset: | ||
- Arithmetic is not implemented as a polyfill, and instead collapses them | ||
first. Thus, | ||
```rust | ||
1 + 1 // coerces to | ||
1i32 + 1i32 // and thus types as i32, not ulit | ||
``` | ||
- Static references are not automatically derefenced, so you'd to write | ||
```rust | ||
let y = x & *FOO; | ||
``` | ||
- Either type can *only* appear as the `T` in `&'static T`, and in no | ||
other place. Type aliases are ok, but not associated types. I.e., | ||
```rust | ||
fn foo<T: ?Sized>(x: &'static T) -> Box<T> { .. } | ||
|
||
let _ = foo(&42); // T types as i32, not ulit! | ||
let _ = foo::<ulit>(&42); // Error: cannot use ulit as type parameter right now | ||
|
||
let _: Box<ulit> = foo(&42); // Error: cannot use ulit as type parameter right now | ||
``` | ||
|
||
The compiler actually has a name for these types: `{integer}` and `{float}`, | ||
as they appear in error messages. We may want to use these with `ulit` and | ||
`flit`, but it is up for debate whether this will confuse beginners who | ||
shouldn't be worrying about an advanced language feature. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This adds some rather subtle rules to typeck, so we should be *very* careful | ||
to implement this without triggering either soundness or regression. | ||
|
||
In fact, this might trigger regression among numeric literals, a core language | ||
feature! | ||
|
||
The stronger versions of this proposal also introduce a confusing footgun- | ||
these literal types are *not* meant to be used as runtime bignums, and this | ||
may confuse users if there isn't a big warning in the documentation. | ||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
This is the best way to do this because it's the simplest. This proposal shows | ||
what all of the knobs we could add *are*, but at the end of the day, it's a | ||
DST with a magic coersion rule. | ||
|
||
I don't know of any good alternatives to this that aren't implementation | ||
details. While we can sidestep untyped `const`s with macros, we can't do | ||
it anywhere as cleanly for `BigNum::new()`, and custom literals. | ||
|
||
We can just... not do this, and, for custom literals, accept `u128` and | ||
`f64`, in lieu of C++. Given that it is concievable that we will get | ||
bigger numeric types, e.g. `u256`, which would require breaking whatever | ||
`ops` trait is used to implement custom integer literals. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
Scala's dotty compiler has explicit literal types: the type of 1 is 1.type, | ||
which is a subtype of Int (corresponding to the JVM int type). In addition, | ||
String literals also have types (e.g. `"foo".type`), but that is beyond the scope of | ||
this proposal. These types are mostly intended to be used in generics; I don’t | ||
know of any language that uses a single type for all int/float literals. | ||
|
||
As pointed at the start of this RFC, many languages have untyped constants, | ||
but this is often opt-out, if at all. We believe the proposed opt-in mechanism | ||
for untyped constants is not the enormous footgun typeless-by-default is. | ||
|
||
See below for alternatives regarding coersion. | ||
|
||
C++ has custom literals, but custom literals are beyond the scope of this | ||
proposal. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
The main problem is the following: | ||
- How much should we weaken the proposal, to get a tractable subset? | ||
|
||
We also don't know exactly in what situations a literal | ||
type coerces to a sized type. This RFC proposes doing so when `ulit` | ||
and `flit` appear in a `Sized` context. We could, alternatively: | ||
- Coerce whenever they're used in a *runtime* setting | ||
- Coerce whenever a type needs to be deduced (so that `ulit` and | ||
`flit` bindings must be manually typed). | ||
|
||
Finally, some other minor considerations: | ||
- The names of the literals. `u__` appeared in the Pre-RFC for this | ||
proposla, and `IntLit` has also been proposed, though this not | ||
agree with the naming convention for other numeric types. | ||
- Should we consider a more granular approach, like Scala’s? | ||
- What should `&ulit` look like through FFI? | ||
|
||
# Future extensions | ||
[future-extensions]: #future-extensions | ||
|
||
A major future use of this proposal is allowing arbitrary precision | ||
in custom literals, like in the following strawman (imagine | ||
that we have `const fn` in traits for now): | ||
|
||
```rust | ||
// core::ops | ||
#[lang = "int_lit"] | ||
pub trait IntLit { | ||
const fn int_lit(lit: &'static ulit) -> Self; | ||
} | ||
|
||
// .. | ||
struct BigInt { | ||
negative: bool, | ||
bytes: Vec<u8> | ||
} | ||
|
||
impl IntLit for BigInt { | ||
const fn int_lit(lit: &'static ulit) -> BigInt { | ||
BigInt { negative: false, bytes: lit.lit_bytes().to_vec() } | ||
} | ||
} | ||
|
||
// at last, our original example can be made to compile! | ||
const VEGETA_CANT_EVEN: BigInt = 9_000_000_000_000_000_000_000_000_000_000_000_000_001_BigInt; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The far simpler solution for custom literals is to just take a string. In fact C++ supports that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type would a function-style proc macro get if it was "passed" a superlong literal like this? Just a numeric literal node containing a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ixrec you get a
Literal
. Note that there's no public methods to extract the content besides.to_string()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like the idea of taking literally
&'static str
, since any application using a size that doesn't fit in the biggest type (and thus has aFromStr
implementation) will need to parse the literal. I'm in favor of a string-based representation, but I think it should be opaque, with methods to extract components, like the exponent of a scientific notation string.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Chopping the string into its basic components is the easiest part of the parsing process by a mile. The compiler can't help with any of the actually hard parts, and inventing a whole new (thoroughly weird) kind of primitive type just to save a few library types the trouble of doing
.split('.')
and.split('e')
seems disproportionate.