From 3778433036f35b0c56bef8e726b49c074b696985 Mon Sep 17 00:00:00 2001 From: Connor Horman Date: Wed, 21 Aug 2024 21:30:23 -0400 Subject: [PATCH 1/3] Add identifier syntax to identifiers.md --- src/identifiers.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/src/identifiers.md b/src/identifiers.md index c760f6826..66b9d1af2 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -1,5 +1,8 @@ # Identifiers +r[ident] + +r[ident.syntax] > **Lexer:**\ > IDENTIFIER_OR_KEYWORD :\ >       XID_Start XID_Continue\*\ @@ -13,6 +16,7 @@ > NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER +r[ident.unicode] Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for Unicode version 15.0, with the additions described below. Some examples of identifiers: * `foo` @@ -21,6 +25,7 @@ Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for * `Москва` * `東京` +r[ident.profile] The profile used from UAX #31 is: * Start := [`XID_Start`], plus the underscore character (U+005F) @@ -31,28 +36,47 @@ with the additional constraint that a single underscore character is not an iden > **Note**: Identifiers starting with an underscore are typically used to indicate an identifier that is intentionally unused, and will silence the unused warning in `rustc`. +r[ident.keyword] Identifiers may not be a [strict] or [reserved] keyword without the `r#` prefix described below in [raw identifiers](#raw-identifiers). +r[ident.zero-width-chars] Zero width non-joiner (ZWNJ U+200C) and zero width joiner (ZWJ U+200D) characters are not allowed in identifiers. +r[ident.ascii-limitations] Identifiers are restricted to the ASCII subset of [`XID_Start`] and [`XID_Continue`] in the following situations: +r[ident.ascii-extern-crate] * [`extern crate`] declarations + +r[ident.ascii-extern-prelude] * External crate names referenced in a [path] + +r[ident.ascii-outlined-module] * [Module] names loaded from the filesystem without a [`path` attribute] + +r[ident.ascii-no_mangle] * [`no_mangle`] attributed items + +r[ident.ascii-extern-item] * Item names in [external blocks] ## Normalization +r[ident.normalize] + Identifiers are normalized using Normalization Form C (NFC) as defined in [Unicode Standard Annex #15][UAX15]. Two identifiers are equal if their NFC forms are equal. [Procedural][proc-macro] and [declarative][mbe] macros receive normalized identifiers in their input. ## Raw identifiers +r[ident.raw] + +r[ident.raw.intro] A raw identifier is like a normal identifier, but prefixed by `r#`. (Note that the `r#` prefix is not included as part of the actual identifier.) + +r[ident.raw.allowed] Unlike a normal identifier, a raw identifier may be any strict or reserved keyword except the ones listed above for `RAW_IDENTIFIER`. From c4d99b2ff7f25e64f278ad0476b93ac736a57067 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Tue, 17 Sep 2024 15:51:02 -0700 Subject: [PATCH 2/3] Switch to normalization Use the same word as used in the section title. --- src/identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/identifiers.md b/src/identifiers.md index 66b9d1af2..1013a58c8 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -62,7 +62,7 @@ r[ident.ascii-extern-item] ## Normalization -r[ident.normalize] +r[ident.normalization] Identifiers are normalized using Normalization Form C (NFC) as defined in [Unicode Standard Annex #15][UAX15]. Two identifiers are equal if their NFC forms are equal. From 02596d2b5e9a550d6a0def41817b4f6e68d3bb9e Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Tue, 17 Sep 2024 15:52:37 -0700 Subject: [PATCH 3/3] Remove ascii-limitations specific rules This particular section didn't feel like it needed separate rule identifiers for each list item. I think anything needing to refer to the restrictions around ascii can just link to `ident.ascii-limitations`, and the user should be able to see what it is referring to. --- src/identifiers.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/src/identifiers.md b/src/identifiers.md index 1013a58c8..f34ae482e 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -45,19 +45,10 @@ Zero width non-joiner (ZWNJ U+200C) and zero width joiner (ZWJ U+200D) character r[ident.ascii-limitations] Identifiers are restricted to the ASCII subset of [`XID_Start`] and [`XID_Continue`] in the following situations: -r[ident.ascii-extern-crate] * [`extern crate`] declarations - -r[ident.ascii-extern-prelude] * External crate names referenced in a [path] - -r[ident.ascii-outlined-module] * [Module] names loaded from the filesystem without a [`path` attribute] - -r[ident.ascii-no_mangle] * [`no_mangle`] attributed items - -r[ident.ascii-extern-item] * Item names in [external blocks] ## Normalization