-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: Numeric literal types #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// it is unlikely that an flit will need to be coerced at runtime | ||
struct flit { | ||
middle: usize, | ||
bytes: [u8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the fields to reflect that flit
is now a ratio.
In the ratio representation, how to ensure the compiler won't be DoS'ed with the following (it prints inf
today)?
fn main() {
let m = 1.0e+999_999_999_999_999_999_999_999_999;
println!("{:?}", m);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This representation doesn't work either, at least not the "fdiv to convert" bit, since to do that you first need to convert numerator and denominator to the float format.
The "ratio of bignums" aspect is at least lossless and can therefore be salvaged by a more expensive conversion algorithms, but it's not more useful than other representations for those algorithms. Frankly, I don't think that there's anything better than a plain old string for representing literals.
In any case, the conversion will be rather expensive when done at runtime. In the worst case it requires at least one >1000 bit integer division+remainder calculations, to the best of my knowledge even multiple. Yet another reason to not permit such values to leak into runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kennytm I don't quite understand what you mean-- the middle
field indicated where the numerator ends and the denominator starts? I wrote ([u8], [u8])
originally but I didn't want to have to explain the pseudocode.
Also... this seems like a problem, since neither a ratio or IEEE representation seem better... the former makes scientific notation awful, and the latter makes most decimals require truncation. I think @rkruppe has a point- I don't intend these values to ever reach a runtime context, so in practice a string representation with a very expensive conversion is fine. I'll update the RFC later with a list of possible representations and their drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If flit
is going to be just a string without (I assume) any compile-time arbitrary-precision arithmetic and the conversion happens at compile time anyway, I don't see any reason to have flit
in the first place: just use strings and use the standard float parsing facilities (which aren't currently available at compile time, but should be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the standard float parsing facilities handle arbitrary-size floats? I assume you're referring to f64::parse
and friends. For most use-cases what you suggest is totally fine, but I worry about restricting ourselves to the capacity of f64
.
As I mentioned below, I think an opaque type (which could even just be a lang item!) might be a neater abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on. Let's be careful about what the language/compiler can actually provide to a user-defined type in this area. It can't reasonably do base conversion and rounding for them. For example, a "bignum rational" library type, a base 10 floating point, and a base 2 fixed point type all do very different things with the decimal digits in the source code. Library types will often have to do their own custom parsing, period.
Furthermore, even if there's code to be shared between many such libraries, it can simply be yet more library code. It doesn't have to be built into the compiler (just as f64::parse
isn't!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure! Here's what I imagine if we go for the string-based alternative:
// mod core::ops? core::num is private iirc
// the compiler needs to know about this type, since it needs to
// be able to construct it from literals in source code.
#[lang = "float_lit"]
// sticking with flit for now, though since it's not a primitive
// will definitely want to go with FloatLit... which I would be ok with
struct flit(str);
impl flit {
/// The literal, as it appears in source code.
/// (Canonicalized to remove underscores.)
///
/// Consider, e.g. `lit.verbatim().parse::<f64>().expect("...")`
const fn verbatim(&self) -> &str { .. }
// the following are all *very* expensive str-manipulation fns
const fn numer_bytes(&self) -> &[u8] { .. }
const fn denom_bytes(&self) -> &[u8] { .. }
const fn mantissa_bytes(&self, base: usize) -> &[u8] { .. }
const fn exponent_bytes(&self, base: usize) -> &[u8] { .. }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who would benefit from {numer,denom}_bytes
and how much? The only types that could use it that comes to mind are rationals, and it only saves them calling a function they'd probably have anyway (parsing decimal strings) -- and they may not otherwise have a way to interpret these u8 slices.
As for mantissa_bytes
, exponent_bytes
, I struggle to understand what these even do, let alone how they're useful for anything. The (mantissa, exponent)
representation generally requires a periodic mantissa, since it's just a different way to write scientific notation. So, does it round? Then it's missing the target precision. But even the target precision is not enough for types with fixed-size exponent field, because overflowing the exponent range can impact rounding (e.g., if you have subnormals like IEEE 754).
What you can write is a parsing function for IEEE 754 floating point parametrized by the base, precision, and exponent range (that's the way the standard is written, even!) but that function is even more niche than the rational-based functions discussed before.
And yet again my most important contention remains unaddressed: what justifies putting these conversion functions into the standard library rather than letting those who need that conversion either do it themselves or import a third party library for it?
Am I right in understanding that there shall never be a runtime value of these types? So no coercion from Otherwise we run into not so nice situations where the runtime value is silently losing bits due to a coercion and we can't even guarantee that we'd lint it. We can still do a best-effort variant, but that feels less nice. |
It's not clear to me that Go-style "untyped" constants and custom literals should be using the same underlying types, or that we want custom literals. My understanding of Go's "untyped" arbitrary-precision constants is that the only advantage they have relative to type inference of constants (which we still haven't figured out) is that Go can do arbitrary-precision arithmetic with those constants, deferring any cocercion into a fixed-size type until some non-constant code is involved. If I understand this RFC correctly, Of course, if they did provide arbitrary-precision arithmetic, then they wouldn't be "literal" types anymore and the names Since this RFC doesn't mention arbitrary-precision arithmetic or type inference of constants at all, I'm confused as to what this RFC on its own actually accomplishes or what the argument for it is supposed to be. Right now, it just feels like half of a custom literals proposal that doesn't make sense to discuss in isolation (especially since my concern with custom literals is motivation, not mechanism). Did I misunderstand Go or this RFC somehow? |
I think making custom literal machinery involve an arbitrary-precision type (unlike C++, where you only get the builtin fixed-size number and char types) only makes sense if we plan on going all the way with it and supporting arbitrary-precision arithmetic of custom-literal-using types, e.g.:
Though I have no idea how we'd want to design custom |
C++, the only language with custom literals, simply takes its versions of | ||
`u64` and `f64` as arguments for literals, this is an unnecessary restriction | ||
in Rust, given that we recently stabalized the `u128` type. This problem | ||
cannot be neatly worked around, as far as we know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The far simpler solution for custom literals is to just take a string. In fact C++ supports that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type would a function-style proc macro get if it was "passed" a superlong literal like this? Just a numeric literal node containing a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite like the idea of taking literally &'static str
, since any application using a size that doesn't fit in the biggest type (and thus has a FromStr
implementation) will need to parse the literal. I'm in favor of a string-based representation, but I think it should be opaque, with methods to extract components, like the exponent of a scientific notation string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Chopping the string into its basic components is the easiest part of the parsing process by a mile. The compiler can't help with any of the actually hard parts, and inventing a whole new (thoroughly weird) kind of primitive type just to save a few library types the trouble of doing .split('.')
and .split('e')
seems disproportionate.
``` | ||
|
||
Literal types are otherwise *mostly* like normal integers. They support | ||
arithmetic and comparisons (but don't implement any `std::ops` traits, since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include arithmetic for flit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; I guess that's not clear. I'll update the RFC noting that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Where's the motivation for arbitrary-precision "float" arithmetic?
Also: how? Completely accurate arithmetic beyond integers is a difficult area without any local minima that are clearly good enough to bake into the language. Usually people envision bignum rationals for this sort of thing, but besides not being able to handle anything beyond add/sub/mul/div, it has serious downsides: even if you always use irreducible fractions and eat the significant overhead of doing so, the space requirements are pretty bad for many numbers (and truly awful for some).
@oli-obk @Ixrec
I'm proposing this RFC now as a result of this internals thread, in an attempt to start resolving the big open questions. |
Cool, thanks for clarifying. In that case my major concern is, as mentioned above, whether this is possible without designing some sort of bignum type that's built into the core runtime language. I see the appeal of a "const-only bignum", but that would mean a massive divergence between const code and runtime code (instead of the status quo where const code is mostly a subset of runtime code), and getting the final bignum value into runtime code raises a lot of questions that Go didn't have to answer. If we just say that bignum is const-only and it coerces to fixed-size types, does that mean we can never have that bignum at runtime in the future? Does that create compatibility hazards for adding type inference in constants someday? How does a "const-only type" with implicit coercions interact with generics, e.g. can I write a |
Taking a step back from the various technical issues, I find it extremely difficult to read this as a RFC. It seems more like a superposition of several different and only slightly-overlapping proposals. While in some sense the union of all the features needed for the various proposals gives something that would technically addresses all the use cases, the result feels more like a hard-to-follow frankenstein proposal than a single general design to me. As I see it, there are at least three mostly-to-entirely separate mechanism that this RFC tries to define:
I admit I'm not convinced by any of these three proposals, but it seems clear to me that muddling them together hurts all of them. |
Thanks all for the feedback! It looks like I accidentally coupled a bunch of orthogonal features, and it'll take me a couple days to disentangle them. I'm going to close the PR for now, and I'll re-open it once I'm done consulting with Dr Frankenstein (i.e., once I have something a bit more workable)! |
This RFC proposes two new primitive types,
ulit
andflit
, which represent unsized numeric literals. These are intended to be used as the types of untyped constants (like, for example, C-style#define
s or Go-style constants), and as the argument types for a future custom literals feature.Rendered.
CC @scottmcm