Skip to content

RFC: Numeric literal types #2507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
318 changes: 318 additions & 0 deletions text/0000-numeric-literal-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
- Feature Name: numeric_literal_types
- Start Date: 2018-07-30
- RFC PR: _
- Rust Issue: _

# Summary
[summary]: #summary

This RFC introduces two new types: `ulit` and `flit`. These are the *numeric
literal types*, i.e., the type of an integer literal `42` or a float literal
`1.0`. These types exist to give a name to literals that do not have a fixed
size. Consider the following error:
```
error: int literal is too large
--> src/main.rs:2:32
|
2 | const VEGETA_CANT_EVEN: u128 = 9_000_000_000_000_000_000_000_000_000_000_000_000_001;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
This expression could instead be given the type `ulit`, which could later
be narrowed into a "real" integer, or fed into a `const` constructor,
like `BigInt::new()`, without having to restrict itself to `u128`. These
types, while unsized, are capable of being coerced into integers,
following the current literal typing rules.

Introducing language-level bignums is an *non-goal*.
This RFC lays the groundwork for custom literals, but custom literals
themselves are *also* a non-goal.

Note: this proposal is given in full generality, with a series of weakened
subsets that might be easier to implement or stabalize. The guide-level
explanation is written only with this full generality in mind, since I don't
think it's too difficult to explain the weakenings. Accepting this RFC will
probably entail picking a weakening and applying it to both explanations.

# Motivation
[motivation]: #motivation

This proposal has a few motivating use cases:
- Untyped compile-time constants, as in Go or C (via `#define`).
- Bignum constructors.
- Custom integer literals, à la `operator""`.

The former is valuable, because it allows us to hoist several occurences
of the same literal in different typed contexts, without having to type
it as the largest possible numeric type and explicitly narrow, i.e.
```rust
let foo = my_u8() & 0b0101_0101;
let bar = my_i32() & 0b0101_0101;
// becomes
const MY_MASK: ulit = 0b0101_0101;
let foo = my_u8() & MY_MASK;
let bar = my_i32() & MY_MASK;
// instead of
const MY_MASK: u128 = 0b0101_0101;
let foo = my_u8() & (MY_MASK as u8);
let bar = my_i32() & (MY_MASK as i32);
```
This can be emulated by a macro that expands to the given literal, but
that is unergonomic, and calling `MY_MASK!()` does not make it clear
that this is a compile-time constant (`ALL_CAPS` not withstanding).

The latter two are essentially the same proposal: access to arbitrary-precission
integers for constructing bignums (and other custom literals).
Custom literals need to take a number as input; while
C++, the only language with custom literals, simply takes its versions of
`u64` and `f64` as arguments for literals, this is an unnecessary restriction
in Rust, given that we recently stabalized the `u128` type. This problem
cannot be neatly worked around, as far as we know.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The far simpler solution for custom literals is to just take a string. In fact C++ supports that too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type would a function-style proc macro get if it was "passed" a superlong literal like this? Just a numeric literal node containing a string?

Copy link
Member

@kennytm kennytm Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ixrec you get a Literal. Note that there's no public methods to extract the content besides .to_string().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like the idea of taking literally &'static str, since any application using a size that doesn't fit in the biggest type (and thus has a FromStr implementation) will need to parse the literal. I'm in favor of a string-based representation, but I think it should be opaque, with methods to extract components, like the exponent of a scientific notation string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Chopping the string into its basic components is the easiest part of the parsing process by a mile. The compiler can't help with any of the actually hard parts, and inventing a whole new (thoroughly weird) kind of primitive type just to save a few library types the trouble of doing .split('.') and .split('e') seems disproportionate.


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Consider the expression `42`. What's its type? The basic language introduction
might lead you to believe that, like in C++ and Java, it's `i32`. In reality,
the compiler assigns this expression the type `ulit`: the type of all *integer
literals*. Note the `u`: this is because all integer literals are unsigned!
The float equivalent is `flit`.

`ulit` and `flit` are both DSTs, so they can't be passed to functons like
normal values. Unlike other DSTs, however, if they are used in a `Sized`
context, they will attempt to coerce into a sized integer type, defaulting
to `i32` or `f64` if there isn't an obvious choice. This occurs silently,
since one almost always wants a sized integer:
```rust
let x = 42; // 42 types as ulit, but since a let binding requires a
// Sized value, it tries to coerce to a sized integer. since
// there isn't an obvious one, it picks i32. Hence, x: i32.

let y = 42u32; // 42u32 has type u32, so no coersion occurs.
let z: u32 = 42; // ulit coerces to u32, since it's the required type
```

Literal types are otherwise *mostly* like normal integers. They support
arithmetic and comparisons (but don't implement any `std::ops` traits, since

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include arithmetic for flit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; I guess that's not clear. I'll update the RFC noting that later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Where's the motivation for arbitrary-precision "float" arithmetic?

Also: how? Completely accurate arithmetic beyond integers is a difficult area without any local minima that are clearly good enough to bake into the language. Usually people envision bignum rationals for this sort of thing, but besides not being able to handle anything beyond add/sub/mul/div, it has serious downsides: even if you always use irreducible fractions and eat the significant overhead of doing so, the space requirements are pretty bad for many numbers (and truly awful for some).

they're not `Sized`). Like any DST, they can be passed around behind references.
You can even write
```rust
const REALLY_BIG: &ulit = &1000000000000000000000;
// analogous to
const HELLO_WORLD: &str = "Hello, world!";
```
You can then use `REALLY_BIG` anywhere you'd use the literal, instead. The
reference types `&'static ulit` and `&'static flit` will automatically coerce
into any numeric type, via dereferencing.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This RFC introduces two new DSTs: `ulit` and `flit`. Note that we do not
introduce `ilit`; this is because it is not possible to write down a literal
for a negative number, since `-1` is *really* `1.neg()`. These types behave
like most DSTs, with a few exceptions:

If either type is used in a context that requires a `Sized` type
(a function call, a let binding, a generic paramter, etc), they will
coerce according to the current typing rules for literals: whatever
is infered as correct, or `i32`/`f64` as a fallback. Note that `as` casts
do what is expected. See [unresolved-questions] for alternative
ways we could perform the coersions.

For ergonomic reasons,
static references to either type are dereferenced automatically in `Sized`
context. This is to support the following pattern:
```rust
const FOO: &ulit = 0b0101_0101;

let x: u8 = 5;
let y = 5 & FOO; // here `FOO` is coerced from `&ulit` to `u8`
```

The representation of `ulit` and `flit` is unspecified, but this RFC suggests
representations. Note that the compiler is *not* required to use these;
they are merely a suggestion for what a good representation would be.
```rust
// represented as an array of bytes in the target endianness;
// this endianness choice means a coersion is just a memcpy
struct ulit([u8]);

// represented as a ratio of target-endian, arbitrary-length
// integers. this is chosen over an IEEE-like base-2 notation,
// which would require rounding. unfortunately, this requires
// an fdiv for coersion, though this is not a problem, since
// it is unlikely that an flit will need to be coerced at runtime
struct flit {
middle: usize,
bytes: [u8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the fields to reflect that flit is now a ratio.

In the ratio representation, how to ensure the compiler won't be DoS'ed with the following (it prints inf today)?

fn main() {
    let m = 1.0e+999_999_999_999_999_999_999_999_999;
    println!("{:?}", m);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This representation doesn't work either, at least not the "fdiv to convert" bit, since to do that you first need to convert numerator and denominator to the float format.

The "ratio of bignums" aspect is at least lossless and can therefore be salvaged by a more expensive conversion algorithms, but it's not more useful than other representations for those algorithms. Frankly, I don't think that there's anything better than a plain old string for representing literals.

In any case, the conversion will be rather expensive when done at runtime. In the worst case it requires at least one >1000 bit integer division+remainder calculations, to the best of my knowledge even multiple. Yet another reason to not permit such values to leak into runtime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kennytm I don't quite understand what you mean-- the middle field indicated where the numerator ends and the denominator starts? I wrote ([u8], [u8]) originally but I didn't want to have to explain the pseudocode.

Also... this seems like a problem, since neither a ratio or IEEE representation seem better... the former makes scientific notation awful, and the latter makes most decimals require truncation. I think @rkruppe has a point- I don't intend these values to ever reach a runtime context, so in practice a string representation with a very expensive conversion is fine. I'll update the RFC later with a list of possible representations and their drawbacks.

Copy link

@hanna-kruppe hanna-kruppe Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If flit is going to be just a string without (I assume) any compile-time arbitrary-precision arithmetic and the conversion happens at compile time anyway, I don't see any reason to have flit in the first place: just use strings and use the standard float parsing facilities (which aren't currently available at compile time, but should be).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the standard float parsing facilities handle arbitrary-size floats? I assume you're referring to f64::parse and friends. For most use-cases what you suggest is totally fine, but I worry about restricting ourselves to the capacity of f64.

As I mentioned below, I think an opaque type (which could even just be a lang item!) might be a neater abstraction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold on. Let's be careful about what the language/compiler can actually provide to a user-defined type in this area. It can't reasonably do base conversion and rounding for them. For example, a "bignum rational" library type, a base 10 floating point, and a base 2 fixed point type all do very different things with the decimal digits in the source code. Library types will often have to do their own custom parsing, period.

Furthermore, even if there's code to be shared between many such libraries, it can simply be yet more library code. It doesn't have to be built into the compiler (just as f64::parse isn't!).

Copy link
Author

@mcy mcy Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure! Here's what I imagine if we go for the string-based alternative:

// mod core::ops? core::num is private iirc

// the compiler needs to know about this type, since it needs to
// be able to construct it from literals in source code.
#[lang = "float_lit"] 
// sticking with flit for now, though since it's not a primitive 
// will definitely want to go with FloatLit... which I would be ok with
struct flit(str); 

impl flit {
    /// The literal, as it appears in source code.
    /// (Canonicalized to remove underscores.)
    ///
    /// Consider, e.g. `lit.verbatim().parse::<f64>().expect("...")`
    const fn verbatim(&self) -> &str { .. }

    // the following are all *very* expensive str-manipulation fns
    const fn numer_bytes(&self) -> &[u8] { .. }
    const fn denom_bytes(&self) -> &[u8] { .. }

    const fn mantissa_bytes(&self, base: usize) -> &[u8] { .. }
    const fn exponent_bytes(&self, base: usize) -> &[u8] { .. }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who would benefit from {numer,denom}_bytes and how much? The only types that could use it that comes to mind are rationals, and it only saves them calling a function they'd probably have anyway (parsing decimal strings) -- and they may not otherwise have a way to interpret these u8 slices.

As for mantissa_bytes, exponent_bytes, I struggle to understand what these even do, let alone how they're useful for anything. The (mantissa, exponent) representation generally requires a periodic mantissa, since it's just a different way to write scientific notation. So, does it round? Then it's missing the target precision. But even the target precision is not enough for types with fixed-size exponent field, because overflowing the exponent range can impact rounding (e.g., if you have subnormals like IEEE 754).

What you can write is a parsing function for IEEE 754 floating point parametrized by the base, precision, and exponent range (that's the way the standard is written, even!) but that function is even more niche than the rational-based functions discussed before.


And yet again my most important contention remains unaddressed: what justifies putting these conversion functions into the standard library rather than letting those who need that conversion either do it themselves or import a third party library for it?

}
```
`ulit` and `flit` do not implement any `ops` arithmetic traits, since
those require `Self: Sized`. They do, however, support all the usual arithmetic
operations primitively. Since these types are *not* meant to be used at runtime
as bignums, the compiler is encouraged to implement these naively, and
to warn when the constant time expression evaluator can't fold them away.

Again, we emphasize that the representation of these types is unspecified, and
the above is only a discussion of *possible* layouts.

Furthermore, they support the following, self-explanatory interfaces:
```rust
impl ulit {
fn lit_bytes(&self) -> &[u8];
}

impl flit {
fn lit_numer(&self) -> &[u8];
fn lit_denom(&self) -> &[u8];
}
```
The documentation should point out that the endianness of the returned slices
is platform-dependent. Alternatively, we could make it little-endian by default
and add some mechanism to get it in the platform endianess. We may want to
guarantee that, e.g.,
```rust
42i32 == transmute_copy::<[u8], i32>(&42.lit_bytes()[0..4])
```

`ulit` and `flit` implement `PartialEq, Eq, PartialOrd, Ord`. Note that
`flit` cannot take on the IEEE values `Infinity`, `-Infinity`, or `NaN`, so
we can *actually* get away with this.

`ulit` and `flit` are *never* infered as the value of type variables solely
on the basis that they are the type of a literal. It is unclear if we
should allow the last case here:
```rust
fn foo<T: ?Sized>(x: &'static T) -> Box<T> { .. }

let _ = foo(&42); // T types as i32, not ulit!
let _ = foo::<ulit>(&42); // T is explcitily types as ulit. this is OK!

let _: Box<ulit> = foo(&42); // T types as ulit
```

## Weakenings

The following are ways in which we can weaken this proposal into a workable
subset:
- Arithmetic is not implemented as a polyfill, and instead collapses them
first. Thus,
```rust
1 + 1 // coerces to
1i32 + 1i32 // and thus types as i32, not ulit
```
- Static references are not automatically derefenced, so you'd to write
```rust
let y = x & *FOO;
```
- Either type can *only* appear as the `T` in `&'static T`, and in no
other place. Type aliases are ok, but not associated types. I.e.,
```rust
fn foo<T: ?Sized>(x: &'static T) -> Box<T> { .. }

let _ = foo(&42); // T types as i32, not ulit!
let _ = foo::<ulit>(&42); // Error: cannot use ulit as type parameter right now

let _: Box<ulit> = foo(&42); // Error: cannot use ulit as type parameter right now
```

The compiler actually has a name for these types: `{integer}` and `{float}`,
as they appear in error messages. We may want to use these with `ulit` and
`flit`, but it is up for debate whether this will confuse beginners who
shouldn't be worrying about an advanced language feature.

# Drawbacks
[drawbacks]: #drawbacks

This adds some rather subtle rules to typeck, so we should be *very* careful
to implement this without triggering either soundness or regression.

In fact, this might trigger regression among numeric literals, a core language
feature!

The stronger versions of this proposal also introduce a confusing footgun-
these literal types are *not* meant to be used as runtime bignums, and this
may confuse users if there isn't a big warning in the documentation.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

This is the best way to do this because it's the simplest. This proposal shows
what all of the knobs we could add *are*, but at the end of the day, it's a
DST with a magic coersion rule.

I don't know of any good alternatives to this that aren't implementation
details. While we can sidestep untyped `const`s with macros, we can't do
it anywhere as cleanly for `BigNum::new()`, and custom literals.

We can just... not do this, and, for custom literals, accept `u128` and
`f64`, in lieu of C++. Given that it is concievable that we will get
bigger numeric types, e.g. `u256`, which would require breaking whatever
`ops` trait is used to implement custom integer literals.

# Prior art
[prior-art]: #prior-art

Scala's dotty compiler has explicit literal types: the type of 1 is 1.type,
which is a subtype of Int (corresponding to the JVM int type). In addition,
String literals also have types (e.g. `"foo".type`), but that is beyond the scope of
this proposal. These types are mostly intended to be used in generics; I don’t
know of any language that uses a single type for all int/float literals.

As pointed at the start of this RFC, many languages have untyped constants,
but this is often opt-out, if at all. We believe the proposed opt-in mechanism
for untyped constants is not the enormous footgun typeless-by-default is.

See below for alternatives regarding coersion.

C++ has custom literals, but custom literals are beyond the scope of this
proposal.

# Unresolved questions
[unresolved-questions]: #unresolved-questions
The main problem is the following:
- How much should we weaken the proposal, to get a tractable subset?

We also don't know exactly in what situations a literal
type coerces to a sized type. This RFC proposes doing so when `ulit`
and `flit` appear in a `Sized` context. We could, alternatively:
- Coerce whenever they're used in a *runtime* setting
- Coerce whenever a type needs to be deduced (so that `ulit` and
`flit` bindings must be manually typed).

Finally, some other minor considerations:
- The names of the literals. `u__` appeared in the Pre-RFC for this
proposla, and `IntLit` has also been proposed, though this not
agree with the naming convention for other numeric types.
- Should we consider a more granular approach, like Scala’s?
- What should `&ulit` look like through FFI?

# Future extensions
[future-extensions]: #future-extensions

A major future use of this proposal is allowing arbitrary precision
in custom literals, like in the following strawman (imagine
that we have `const fn` in traits for now):

```rust
// core::ops
#[lang = "int_lit"]
pub trait IntLit {
const fn int_lit(lit: &'static ulit) -> Self;
}

// ..
struct BigInt {
negative: bool,
bytes: Vec<u8>
}

impl IntLit for BigInt {
const fn int_lit(lit: &'static ulit) -> BigInt {
BigInt { negative: false, bytes: lit.lit_bytes().to_vec() }
}
}

// at last, our original example can be made to compile!
const VEGETA_CANT_EVEN: BigInt = 9_000_000_000_000_000_000_000_000_000_000_000_000_001_BigInt;
```