Skip to content

Commit 0574612

Browse files
authored
Merge pull request #2151 from cuviper/raw_identifiers
RFC: Raw Identifiers
2 parents 16cfef8 + d049d6c commit 0574612

File tree

1 file changed

+178
-0
lines changed

1 file changed

+178
-0
lines changed

text/2151-raw-identifiers.md

+178
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
- Feature Name: `raw_identifiers`
2+
- Start Date: 2017-09-14
3+
- RFC PR: [rust-lang/rfcs#2151](https://github.com/rust-lang/rfcs/pull/2151)
4+
- Rust Issue: [rust-lang/rust#48589](https://github.com/rust-lang/rust/issues/48589)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Add a raw identifier format `r#ident`, so crates written in future language
10+
epochs/versions can still use an older API that overlaps with new keywords.
11+
12+
# Motivation
13+
[motivation]: #motivation
14+
15+
One of the primary examples of breaking changes in the epoch RFC is to add new
16+
keywords, and specifically `catch` is the first candidate. However, since
17+
that's seeking crate compatibility across epochs, this would leave a crate in a
18+
newer epoch unable to use `catch` identifiers in the API of a crate in an older
19+
epoch. [@matklad found] 28 crates using `catch` identifiers, some public.
20+
21+
A raw syntax that's *always* an identifier would allow these to remain
22+
compatible, so one can write `r#catch` where `catch`-as-identifier is needed.
23+
24+
[@matklad found]: https://internals.rust-lang.org/t/pre-rfc-raw-identifiers/5502/40
25+
26+
# Guide-level explanation
27+
[guide-level-explanation]: #guide-level-explanation
28+
29+
Although some identifiers are reserved by the Rust language as keywords, it is
30+
still possible to write them as raw identifiers using the `r#` prefix, like
31+
`r#ident`. When written this way, it will *always* be treated as a plain
32+
identifier equivalent to a bare `ident` name, never as a keyword.
33+
34+
For instance, the following is an erroneous use of the `match` keyword:
35+
36+
```rust
37+
fn match(needle: &str, haystack: &str) -> bool {
38+
haystack.contains(needle)
39+
}
40+
```
41+
42+
```text
43+
error: expected identifier, found keyword `match`
44+
--> src/lib.rs:1:4
45+
|
46+
1 | fn match(needle: &str, haystack: &str) -> bool {
47+
| ^^^^^
48+
```
49+
50+
It can instead be written as `fn r#match(needle: &str, haystack: &str)`, using
51+
the `r#match` raw identifier, and the compiler will accept this as a true
52+
`match` function.
53+
54+
Generally when defining items, you should just avoid keywords altogether and
55+
choose a different name. Raw identifiers require the `r#` prefix every time
56+
they are mentioned, making them cumbersome to both the developer and users.
57+
Usually an alternate is preferable: `crate` -> `krate`, `const` -> `constant`,
58+
etc.
59+
60+
However, new Rust epochs may add to the list of reserved keywords, making a
61+
formerly legal identifier now interpreted otherwise. Since compatibility is
62+
maintained between crates of different epochs, this could mean that code written
63+
in a new epoch might not be able to name an identifier in the API of another
64+
crate. Using a raw identifier, it can still be named and used.
65+
66+
```rust
67+
//! baseball.rs in epoch 2015
68+
pub struct Ball;
69+
pub struct Player;
70+
impl Player {
71+
pub fn throw(&mut self) -> Result<Ball> { ... }
72+
pub fn catch(&mut self, ball: Ball) -> Result<()> { ... }
73+
}
74+
```
75+
76+
```rust
77+
//! main.rs in epoch 2018 -- `catch` is now a keyword!
78+
use baseball::*;
79+
fn main() {
80+
let mut player = Player;
81+
let ball = player.throw()?;
82+
player.r#catch(ball)?;
83+
}
84+
```
85+
86+
# Reference-level explanation
87+
[reference-level-explanation]: #reference-level-explanation
88+
89+
The syntax for identifiers allows an optional `r#` prefix for a raw identifier,
90+
otherwise following the normal identifer rules. Raw identifiers are always
91+
interpreted as plain identifiers and never as keywords, regardless of context.
92+
They are also treated equivalent to an identifier that wasn't raw -- for
93+
instance, it's perfectly legal to write:
94+
95+
```rust
96+
let foo = 123;
97+
let bar = r#foo * 2;
98+
```
99+
100+
# Drawbacks
101+
[drawbacks]: #drawbacks
102+
103+
- New syntax is always scary/noisy/etc.
104+
- It might not be intuitively "raw" to a user coming upon this the first time.
105+
106+
# Rationale and Alternatives
107+
[alternatives]: #alternatives
108+
109+
If we don't have any way to refer to identifiers that were legal in prior
110+
epochs, but later became keywords, then this may hurt interoperability between
111+
crates of different epochs. The `r#ident` syntax enables interoperability, and
112+
will hopefully invoke some intuition of being raw, similar to raw strings.
113+
114+
The `br#ident` syntax is also possible, but I see no advantage over `r#ident`.
115+
Identifiers don't need the same kind of distinction as `str` and `[u8]`.
116+
117+
A small possible alternative is to also terminate it like `r#ident#`, which
118+
could allow non-identifier characters to be part of a raw identifier. This
119+
could take a cue from raw strings and allow repetition for internal `#`, like
120+
`r##my #1 ident##`. That doesn't allow a leading `#` or `"` though.
121+
122+
A different possibility is to use backticks for a string-like `` `ident` ``,
123+
like [Kotlin], [Scala], and [Swift]. If it allows non-identifier chars, it
124+
could embrace escapes like `\u`, and have a raw-string-identifier ``
125+
r`slash\ident` `` and even `` r#`tick`ident`# ``. However, backtick identifiers
126+
are annoying to write in markdown. (e.g. ``` `` `ident` `` ```)
127+
128+
Backslashes could connote escaping identifiers, like `\ident`, perhaps
129+
surrounded like `\ident\`, `\{ident}`, etc. However, the infix RFC #1579
130+
currently seems to be leaning towards `\op` syntax already.
131+
132+
Alternatives which already start legal tokens, like [C#]'s `@ident`, [Dart]'s
133+
`#ident`, or alternate prefixes like `identifier#catch`, all break Macros 1.0
134+
as [@kennytm demonstrated]:
135+
136+
```
137+
macro_rules! x {
138+
(@ $a:ident) => {};
139+
(# $a:ident) => {};
140+
($a:ident # $b:ident) => {};
141+
($a:ident) => { should error };
142+
}
143+
x!(@catch);
144+
x!(#catch);
145+
x!(identifier#catch);
146+
x!(keyword#catch);
147+
```
148+
149+
C# allows Unicode escapes directly in identifiers, which also separates them
150+
from keywords, so both `@catch` and `cl\u0061ss` are valid `class` identifiers.
151+
Java also allows Unicode escapes, but they don't avoid keywords.
152+
153+
For some new keywords, there may be contextual mitigations. In the case of
154+
`catch`, it couldn't be a fully contextual keyword because `catch { ... }` could
155+
be a struct literal. That context might be worked around with a path, like
156+
`old_epoch::catch { ... }` to use an identifier instead. Contexts that don't
157+
make sense for a `catch` expression can just be identifiers, like `foo.catch()`.
158+
However, this might not be possible for all future keywords.
159+
160+
There might also be a need for raw keywords in the other direction, e.g. so the
161+
older epoch can still use the new `catch` functionality somehow. I think this
162+
particular case is already served well enough by `do catch { ... }`, if we
163+
choose to stabilize it that way. Perhaps `br#keyword` could be used for this,
164+
but that may not be a good intuitive relationship.
165+
166+
[C#]: https://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx
167+
[Dart]: https://www.dartlang.org/guides/language/language-tour#symbols
168+
[Kotlin]: https://kotlinlang.org/docs/reference/grammar.html
169+
[Scala]: https://www.scala-lang.org/files/archive/spec/2.13/01-lexical-syntax.html#identifiers
170+
[Swift]: https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
171+
[@kennytm demonstrated]: https://internals.rust-lang.org/t/pre-rfc-raw-identifiers/5502/28
172+
173+
# Unresolved questions
174+
[unresolved]: #unresolved-questions
175+
176+
- Do macros need any special care with such identifier tokens?
177+
- Should diagnostics use the `r#` syntax when printing identifiers that overlap keywords?
178+
- Does rustdoc need to use the `r#` syntax? e.g. to document `pub use old_epoch::*`

0 commit comments

Comments
 (0)