|
| 1 | +- Feature Name: `raw_identifiers` |
| 2 | +- Start Date: 2017-09-14 |
| 3 | +- RFC PR: [rust-lang/rfcs#2151](https://github.com/rust-lang/rfcs/pull/2151) |
| 4 | +- Rust Issue: [rust-lang/rust#48589](https://github.com/rust-lang/rust/issues/48589) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Add a raw identifier format `r#ident`, so crates written in future language |
| 10 | +epochs/versions can still use an older API that overlaps with new keywords. |
| 11 | + |
| 12 | +# Motivation |
| 13 | +[motivation]: #motivation |
| 14 | + |
| 15 | +One of the primary examples of breaking changes in the epoch RFC is to add new |
| 16 | +keywords, and specifically `catch` is the first candidate. However, since |
| 17 | +that's seeking crate compatibility across epochs, this would leave a crate in a |
| 18 | +newer epoch unable to use `catch` identifiers in the API of a crate in an older |
| 19 | +epoch. [@matklad found] 28 crates using `catch` identifiers, some public. |
| 20 | + |
| 21 | +A raw syntax that's *always* an identifier would allow these to remain |
| 22 | +compatible, so one can write `r#catch` where `catch`-as-identifier is needed. |
| 23 | + |
| 24 | +[@matklad found]: https://internals.rust-lang.org/t/pre-rfc-raw-identifiers/5502/40 |
| 25 | + |
| 26 | +# Guide-level explanation |
| 27 | +[guide-level-explanation]: #guide-level-explanation |
| 28 | + |
| 29 | +Although some identifiers are reserved by the Rust language as keywords, it is |
| 30 | +still possible to write them as raw identifiers using the `r#` prefix, like |
| 31 | +`r#ident`. When written this way, it will *always* be treated as a plain |
| 32 | +identifier equivalent to a bare `ident` name, never as a keyword. |
| 33 | + |
| 34 | +For instance, the following is an erroneous use of the `match` keyword: |
| 35 | + |
| 36 | +```rust |
| 37 | +fn match(needle: &str, haystack: &str) -> bool { |
| 38 | + haystack.contains(needle) |
| 39 | +} |
| 40 | +``` |
| 41 | + |
| 42 | +```text |
| 43 | +error: expected identifier, found keyword `match` |
| 44 | + --> src/lib.rs:1:4 |
| 45 | + | |
| 46 | +1 | fn match(needle: &str, haystack: &str) -> bool { |
| 47 | + | ^^^^^ |
| 48 | +``` |
| 49 | + |
| 50 | +It can instead be written as `fn r#match(needle: &str, haystack: &str)`, using |
| 51 | +the `r#match` raw identifier, and the compiler will accept this as a true |
| 52 | +`match` function. |
| 53 | + |
| 54 | +Generally when defining items, you should just avoid keywords altogether and |
| 55 | +choose a different name. Raw identifiers require the `r#` prefix every time |
| 56 | +they are mentioned, making them cumbersome to both the developer and users. |
| 57 | +Usually an alternate is preferable: `crate` -> `krate`, `const` -> `constant`, |
| 58 | +etc. |
| 59 | + |
| 60 | +However, new Rust epochs may add to the list of reserved keywords, making a |
| 61 | +formerly legal identifier now interpreted otherwise. Since compatibility is |
| 62 | +maintained between crates of different epochs, this could mean that code written |
| 63 | +in a new epoch might not be able to name an identifier in the API of another |
| 64 | +crate. Using a raw identifier, it can still be named and used. |
| 65 | + |
| 66 | +```rust |
| 67 | +//! baseball.rs in epoch 2015 |
| 68 | +pub struct Ball; |
| 69 | +pub struct Player; |
| 70 | +impl Player { |
| 71 | + pub fn throw(&mut self) -> Result<Ball> { ... } |
| 72 | + pub fn catch(&mut self, ball: Ball) -> Result<()> { ... } |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +```rust |
| 77 | +//! main.rs in epoch 2018 -- `catch` is now a keyword! |
| 78 | +use baseball::*; |
| 79 | +fn main() { |
| 80 | + let mut player = Player; |
| 81 | + let ball = player.throw()?; |
| 82 | + player.r#catch(ball)?; |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +# Reference-level explanation |
| 87 | +[reference-level-explanation]: #reference-level-explanation |
| 88 | + |
| 89 | +The syntax for identifiers allows an optional `r#` prefix for a raw identifier, |
| 90 | +otherwise following the normal identifer rules. Raw identifiers are always |
| 91 | +interpreted as plain identifiers and never as keywords, regardless of context. |
| 92 | +They are also treated equivalent to an identifier that wasn't raw -- for |
| 93 | +instance, it's perfectly legal to write: |
| 94 | + |
| 95 | +```rust |
| 96 | +let foo = 123; |
| 97 | +let bar = r#foo * 2; |
| 98 | +``` |
| 99 | + |
| 100 | +# Drawbacks |
| 101 | +[drawbacks]: #drawbacks |
| 102 | + |
| 103 | +- New syntax is always scary/noisy/etc. |
| 104 | +- It might not be intuitively "raw" to a user coming upon this the first time. |
| 105 | + |
| 106 | +# Rationale and Alternatives |
| 107 | +[alternatives]: #alternatives |
| 108 | + |
| 109 | +If we don't have any way to refer to identifiers that were legal in prior |
| 110 | +epochs, but later became keywords, then this may hurt interoperability between |
| 111 | +crates of different epochs. The `r#ident` syntax enables interoperability, and |
| 112 | +will hopefully invoke some intuition of being raw, similar to raw strings. |
| 113 | + |
| 114 | +The `br#ident` syntax is also possible, but I see no advantage over `r#ident`. |
| 115 | +Identifiers don't need the same kind of distinction as `str` and `[u8]`. |
| 116 | + |
| 117 | +A small possible alternative is to also terminate it like `r#ident#`, which |
| 118 | +could allow non-identifier characters to be part of a raw identifier. This |
| 119 | +could take a cue from raw strings and allow repetition for internal `#`, like |
| 120 | +`r##my #1 ident##`. That doesn't allow a leading `#` or `"` though. |
| 121 | + |
| 122 | +A different possibility is to use backticks for a string-like `` `ident` ``, |
| 123 | +like [Kotlin], [Scala], and [Swift]. If it allows non-identifier chars, it |
| 124 | +could embrace escapes like `\u`, and have a raw-string-identifier `` |
| 125 | +r`slash\ident` `` and even `` r#`tick`ident`# ``. However, backtick identifiers |
| 126 | +are annoying to write in markdown. (e.g. ``` `` `ident` `` ```) |
| 127 | + |
| 128 | +Backslashes could connote escaping identifiers, like `\ident`, perhaps |
| 129 | +surrounded like `\ident\`, `\{ident}`, etc. However, the infix RFC #1579 |
| 130 | +currently seems to be leaning towards `\op` syntax already. |
| 131 | + |
| 132 | +Alternatives which already start legal tokens, like [C#]'s `@ident`, [Dart]'s |
| 133 | +`#ident`, or alternate prefixes like `identifier#catch`, all break Macros 1.0 |
| 134 | +as [@kennytm demonstrated]: |
| 135 | + |
| 136 | +``` |
| 137 | +macro_rules! x { |
| 138 | + (@ $a:ident) => {}; |
| 139 | + (# $a:ident) => {}; |
| 140 | + ($a:ident # $b:ident) => {}; |
| 141 | + ($a:ident) => { should error }; |
| 142 | +} |
| 143 | +x!(@catch); |
| 144 | +x!(#catch); |
| 145 | +x!(identifier#catch); |
| 146 | +x!(keyword#catch); |
| 147 | +``` |
| 148 | + |
| 149 | +C# allows Unicode escapes directly in identifiers, which also separates them |
| 150 | +from keywords, so both `@catch` and `cl\u0061ss` are valid `class` identifiers. |
| 151 | +Java also allows Unicode escapes, but they don't avoid keywords. |
| 152 | + |
| 153 | +For some new keywords, there may be contextual mitigations. In the case of |
| 154 | +`catch`, it couldn't be a fully contextual keyword because `catch { ... }` could |
| 155 | +be a struct literal. That context might be worked around with a path, like |
| 156 | +`old_epoch::catch { ... }` to use an identifier instead. Contexts that don't |
| 157 | +make sense for a `catch` expression can just be identifiers, like `foo.catch()`. |
| 158 | +However, this might not be possible for all future keywords. |
| 159 | + |
| 160 | +There might also be a need for raw keywords in the other direction, e.g. so the |
| 161 | +older epoch can still use the new `catch` functionality somehow. I think this |
| 162 | +particular case is already served well enough by `do catch { ... }`, if we |
| 163 | +choose to stabilize it that way. Perhaps `br#keyword` could be used for this, |
| 164 | +but that may not be a good intuitive relationship. |
| 165 | + |
| 166 | +[C#]: https://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx |
| 167 | +[Dart]: https://www.dartlang.org/guides/language/language-tour#symbols |
| 168 | +[Kotlin]: https://kotlinlang.org/docs/reference/grammar.html |
| 169 | +[Scala]: https://www.scala-lang.org/files/archive/spec/2.13/01-lexical-syntax.html#identifiers |
| 170 | +[Swift]: https://developer.apple.com/library/content/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html |
| 171 | +[@kennytm demonstrated]: https://internals.rust-lang.org/t/pre-rfc-raw-identifiers/5502/28 |
| 172 | + |
| 173 | +# Unresolved questions |
| 174 | +[unresolved]: #unresolved-questions |
| 175 | + |
| 176 | +- Do macros need any special care with such identifier tokens? |
| 177 | +- Should diagnostics use the `r#` syntax when printing identifiers that overlap keywords? |
| 178 | +- Does rustdoc need to use the `r#` syntax? e.g. to document `pub use old_epoch::*` |
0 commit comments