From 5d61dcaad56a3b6aeaba71c45a7d47cebe9bb4b0 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 30 May 2022 17:10:51 -0700 Subject: [PATCH 01/11] Add documentation on v0 symbol mangling. --- src/doc/rustc/book.toml | 3 + src/doc/rustc/src/SUMMARY.md | 1 + src/doc/rustc/src/codegen-options/index.md | 6 +- .../src/codegen-options/symbol-mangling.md | 1225 +++++++++++++++++ 4 files changed, 1233 insertions(+), 2 deletions(-) create mode 100644 src/doc/rustc/src/codegen-options/symbol-mangling.md diff --git a/src/doc/rustc/book.toml b/src/doc/rustc/book.toml index cea6033ede208..14ae1a7207ac0 100644 --- a/src/doc/rustc/book.toml +++ b/src/doc/rustc/book.toml @@ -6,3 +6,6 @@ title = "The rustc book" [output.html] git-repository-url = "https://github.com/rust-lang/rust/tree/master/src/doc/rustc" edit-url-template = "https://github.com/rust-lang/rust/edit/master/src/doc/rustc/{path}" + +[output.html.playground] +runnable = false diff --git a/src/doc/rustc/src/SUMMARY.md b/src/doc/rustc/src/SUMMARY.md index 73343ba9df51b..b06f62f89166e 100644 --- a/src/doc/rustc/src/SUMMARY.md +++ b/src/doc/rustc/src/SUMMARY.md @@ -3,6 +3,7 @@ - [What is rustc?](what-is-rustc.md) - [Command-line Arguments](command-line-arguments.md) - [Codegen Options](codegen-options/index.md) + - [Symbol Mangling](codegen-options/symbol-mangling.md) - [Lints](lints/index.md) - [Lint Levels](lints/levels.md) - [Lint Groups](lints/groups.md) diff --git a/src/doc/rustc/src/codegen-options/index.md b/src/doc/rustc/src/codegen-options/index.md index 1041d5026690f..f77851cdec2de 100644 --- a/src/doc/rustc/src/codegen-options/index.md +++ b/src/doc/rustc/src/codegen-options/index.md @@ -569,13 +569,15 @@ for the purpose of generating object code and linking. Supported values for this option are: -* `v0` — The "v0" mangling scheme. The specific format is not specified at - this time. +* `v0` — The "v0" mangling scheme. The default, if not specified, will use a compiler-chosen default which may change in the future. +See the [Symbol Mangling] chapter for details on symbol mangling and the mangling format. + [name mangling]: https://en.wikipedia.org/wiki/Name_mangling +[Symbol Mangling]: symbol-mangling.md ## target-cpu diff --git a/src/doc/rustc/src/codegen-options/symbol-mangling.md b/src/doc/rustc/src/codegen-options/symbol-mangling.md new file mode 100644 index 0000000000000..1f7c4b805e3d6 --- /dev/null +++ b/src/doc/rustc/src/codegen-options/symbol-mangling.md @@ -0,0 +1,1225 @@ +# Symbol Mangling + +[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation. +The encoded names are used by the linker to associate the name with the thing it refers to. + +The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option. + +[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling +[`-C symbol-mangling-version`]: index.md#symbol-mangling-version + +## Per-item control + +The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item. + +The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static. + +Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item. +The [`#[link_name]` attribute][reference-link_name] can be used to change that name. + + + +[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute +[reference-export_name]: ../../reference/abi.html#the-export_name-attribute +[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute +[reference-extern-block]: ../../reference/items/external-blocks.html + +## Decoding + +The encoded names may need to be decoded in some situations. +For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user. +Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers. +In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names. +[`rustfilt`] is a CLI tool which can demangle names. + +An example of running rustfilt: + +```text +$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function +foo::example_function +``` + +[`rustc-demangle`]: https://crates.io/crates/rustc-demangle +[`rustfilt`]: https://crates.io/crates/rustfilt + +## Mangling versions + +`rustc` supports different mangling versions which encode the names in different ways. +The legacy version (which is currently the default) is not described here. +The "v0" mangling scheme addresses several limitations of the legacy format, +and is [described below](#v0-mangling-format). + +## v0 mangling format + +The v0 mangling format was introduced in [RFC 2603]. +It has the following properties: + +- It provides an unambiguous string encoding for everything that can end up in a binary's symbol table. +- It encodes information about generic parameters in a reversible way. +- The mangled symbols are *decodable* such that demangled form should be easily identifiable as some concrete instance of e.g. a polymorphic function. +- It has a consistent definition that does not rely on pretty-printing certain language constructs. +- Symbols can be restricted to only consist of the characters `A-Z`, `a-z`, `0-9`, and `_`. + This helps ensure that it is platform-independent, + where other characters might have special meaning in some context (e.g. `.` for MSVC `DEF` files). + Unicode symbols are optionally supported. +- It tries to stay efficient, avoiding unnecessarily long names, + and avoiding computationally expensive operations to demangle. + +The v0 format is not intended to be compatible with other mangling schemes (such as C++). + +The v0 format is not presented as a stable ABI for Rust. +This format is currently intended to be well-defined enough that a demangler can produce a reasonable human-readable form of the symbol. +There are several implementation-defined portions that result in it not being possible to entirely predict how a given Rust entity will be encoded. + +The sections below define the encoding of a v0 symbol. +There is no standardized demangled form of the symbols, +though suggestions are provided for how to demangle a symbol. +Implementers may choose to demangle in different ways. + +### Grammar notation + +The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. +A consolidated summary can be found in the [Symbol grammar summary][summary]. + +| Name | Syntax | Example | Description | +|------|--------|---------|-------------| +| Rule | → | A → *B* *C* | A production. | +| Concatenation | whitespace | A → *B* *C* *D* | Individual elements in sequence left-to-right. | +| Alternative | \| | A → *B* \| *C* | Matches either one or the other. | +| Grouping | () | A → *B* (*C* \| *D*) *E* | Groups multiple elements as one. | +| Repetition | {} | A → {*B*} | Repeats the enclosed zero or more times. | +| Option | opt | A → *B*opt *C* | An optional element. | +| Literal | `monospace` | A → `G` | A terminal matching the exact characters case-sensitive. | + +### Symbol name +[symbol-name]: #symbol-name + +> symbol-name → `_R` *[decimal-number]*opt *[path]* *[instantiating-crate]*opt *[vendor-specific-suffix]*opt + +A mangled symbol starts with the two characters `_R` which is a prefix to identify the symbol as a Rust symbol. +The prefix can optionally be followed by a *[decimal-number]* which specifies the encoding version. +This number is currently not used, and is never present in the current encoding. +Following that is a *[path]* which encodes the path to an entity. +The path is followed by an optional *[instantiating-crate]* which helps to disambiguate entities which may be instantiated multiple times in separate crates. +The final part is an optional *[vendor-specific-suffix]*. + +> **Recommended Demangling** +> +> A *symbol-name* should be displayed as the *[path]*. +> The *[instantiating-crate]* and the *[vendor-specific-suffix]* usually need not be displayed. + +> Example: +> ```rust +> std::path::PathBuf::new(); +> ``` +> +> The symbol for `PathBuf::new` in crate `mycrate` is: +> +> ```text +> _RNvMsr_NtCs3ssYzQotkvD_3std4pathNtB5_7PathBuf3newCs15kBYyAo9fc_7mycrate +> ├┘└───────────────────────┬──────────────────────┘└──────────┬─────────┘ +> │ │ │ +> │ │ └── instantiating-crate path "mycrate" +> │ └───────────────────────────────────── path to std::path::PathBuf::new +> └─────────────────────────────────────────────────────────────── `_R` symbol prefix +> ``` +> +> Recommended demangling: `::new` + +### Symbol path +[path]: #symbol-path + +> path → \ +>       *[crate-root]* \ +>    | *[inherent-impl]* \ +>    | *[trait-impl]* \ +>    | *[trait-definition]* \ +>    | *[nested-path]* \ +>    | *[generic-args]* \ +>    | *[backref]* + +A *path* represents a variant of a [Rust path][reference-paths] to some entity. +In addition to typical Rust path segments using identifiers, +it uses extra elements to represent unnameable entities (like an `impl`) or generic arguments for monomorphized items. + +The initial tag character can be used to determine which kind of path it represents: + +| Tag | Rule | Description | +|-----|------|-------------| +| `C` | *[crate-root]* | The root of a crate path. | +| `M` | *[inherent-impl]* | An inherent implementation. | +| `X` | *[trait-impl]* | A trait implementation. | +| `Y` | *[trait-definition]* | A trait definition. | +| `N` | *[nested-path]* | A nested path. | +| `I` | *[generic-args]* | Generic arguments. | +| `B` | *[backref]* | A back reference. | + +#### Path: Crate root +[crate-root]: #path-crate-root + +> crate-root → `C` *[identifier]* + +A *crate-root* indicates a path referring to the root of a crate's module tree. +It consists of the character `C` followed by the crate name as an *[identifier]*. + +The crate name is the name as seen from the defining crate. +Since Rust supports linking multiple crates with the same name, +the *[disambiguator]* is used to make the name unique across the crate graph. + +> **Recommended Demangling** +> +> A *crate-root* can be displayed as the identifier such as `mycrate`. +> +> Usually the disambiguator in the identifier need not be displayed, +> but as an alternate form the disambiguator can be shown in hex such as +> `mycrate[ca63f166dbe9294]`. + +> Example: +> ```rust +> fn example() {} +> ``` +> +> The symbol for `example` in crate `mycrate` is: +> +> ```text +> _RNvCs15kBYyAo9fc_7mycrate7example +> │└────┬─────┘││└──┬──┘ +> │ │ ││ │ +> │ │ ││ └── crate-root identifier "mycrate" +> │ │ │└────── length 7 of "mycrate" +> │ │ └─────── end of base-62-number +> │ └────────────── disambiguator for crate-root "mycrate" 0xca63f166dbe9293 + 1 +> └──────────────────── crate-root +> ``` +> +> Recommended demangling: `mycrate::example` + +#### Path: Inherent impl +[inherent-impl]: #path-inherent-impl + +> inherent-impl → `M` *[impl-path]* *[type]* + +An *inherent-impl* indicates a path to an [inherent implementation][reference-inherent-impl]. +It consists of the character `M` followed by an *[impl-path]* to the impl's parent followed by the *[type]* representing the `Self` type of the impl. + +> **Recommended Demangling** +> +> An *inherent-impl* can be displayed as a qualified path segment to the *[type]* within angled brackets. +> The *[impl-path]* usually need not be displayed. + +> Example: +> ```rust +> struct Example; +> impl Example { +> fn foo() {} +> } +> ``` +> +> The symbol for `foo` in the impl for `Example` is: +> +> ```text +> _RNvMCs15kBYyAo9fc_7mycrateNtB2_7Example3foo +> │└─────────┬──────────┘└────┬──────┘ +> │ │ │ +> │ │ └── Self type "Example" +> │ └─────────────────── path to the impl's parent "mycrate" +> └────────────────────────────── inherent-impl +> ``` +> +> Recommended demangling: `::foo` + +#### Path: Trait impl +[trait-impl]: #path-trait-impl + +> trait-impl → `X` *[impl-path]* *[type]* *[path]* + +A *trait-impl* indicates a path to a [trait implementation][reference-trait-impl]. +It consists of the character `X` followed by an *[impl-path]* to the impl's parent followed by the *[type]* representing the `Self` type of the impl followed by a *[path]* to the trait. + +> **Recommended Demangling** +> +> A *trait-impl* can be displayed as a qualified path segment using the `<` *type* `as` *path* `>` syntax. +> The *[impl-path]* usually need not be displayed. + +> Example: +> ```rust +> struct Example; +> trait Trait { +> fn foo(); +> } +> impl Trait for Example { +> fn foo() {} +> } +> ``` +> +> The symbol for `foo` in the trait impl for `Example` is: +> +> ```text +> _RNvXCs15kBYyAo9fc_7mycrateNtB2_7ExampleNtB2_5Trait3foo +> │└─────────┬──────────┘└─────┬─────┘└────┬────┘ +> │ │ │ │ +> │ │ │ └── path to the trait "Trait" +> │ │ └────────────── Self type "Example" +> │ └──────────────────────────────── path to the impl's parent "mycrate" +> └─────────────────────────────────────────── trait-impl +> ``` +> +> Recommended demangling: `::foo` + +#### Path: Impl +[impl-path]: #path-impl + +> impl-path → *[disambiguator]*opt *[path]* + +An *impl-path* is a path used for *[inherent-impl]* and *[trait-impl]* to indicate the path to parent of an [implementation][reference-implementations]. +It consists of an optional *[disambiguator]* followed by a *[path]*. +The *[path]* is the path to the parent that contains the impl. +The *[disambiguator]* can be used to distinguish between multiple impls within the same parent. + +> **Recommended Demangling** +> +> An *impl-path* usually need not be displayed (unless the location of the impl is desired). + +> Example: +> ```rust +> struct Example; +> impl Example { +> fn foo() {} +> } +> impl Example { +> fn bar() {} +> } +> ``` +> +> The symbol for `foo` in the impl for `Example` is: +> +> ```text +> _RNvMCs7qp2U7fqm6G_7mycrateNtB2_7Example3foo +> └─────────┬──────────┘ +> │ +> └── path to the impl's parent crate-root "mycrate" +> ``` +> +> The symbol for `bar` is similar, though it has a disambiguator to indicate it is in a different impl block. +> +> ```text +> _RNvMs_Cs7qp2U7fqm6G_7mycrateNtB4_7Example3bar +> ├┘└─────────┬──────────┘ +> │ │ +> │ └── path to the impl's parent crate-root "mycrate" +> └────────────── disambiguator 1 +> ``` +> +> Recommended demangling: +> * `foo`: `::foo` +> * `bar`: `::bar` + +#### Path: Trait definition +[trait-definition]: #path-trait-definition + +> trait-definition → `Y` *[type]* *[path]* + +A *trait-definition* is a path to a [trait definition][reference-traits]. +It consists of the character `Y` followed by the *[type]* which is the `Self` type of the referrer, followed by the *[path]* to the trait definition. + +> **Recommended Demangling** +> +> A *trait-definition* can be displayed as a qualified path segment using the `<` *type* `as` *path* `>` syntax. + +> Example: +> ```rust +> trait Trait { +> fn example() {} +> } +> struct Example; +> impl Trait for Example {} +> ``` +> +> The symbol for `example` in the trait `Trait` implemented for `Example` is: +> +> ```text +> _RNvYNtCs15kBYyAo9fc_7mycrate7ExampleNtB4_5Trait7exampleB4_ +> │└──────────────┬───────────────┘└────┬────┘ +> │ │ │ +> │ │ └── path to the trait "Trait" +> │ └──────────────────────── path to the implementing type "mycrate::Example" +> └──────────────────────────────────────── trait-definition +> ``` +> +> Recommended demangling: `::example` + +#### Path: Nested path +[nested-path]: #path-nested-path + +> nested-path → `N` *[namespace]* *[path]* *[identifier]* + +A *nested-path* is a path representing an optionally named entity. +It consists of the character `N` followed by a *[namespace]* indicating the namespace of the entity, +followed by a *[path]* which is a path representing the parent of the entity, +followed by an *[identifier]* of the entity. + +The identifier of the entity may be empty when the entity is not named. +For example, entities like closures, tuple-like struct constructors, and anonymous constants may not have a name. + +> **Recommended Demangling** +> +> A *nested-path* can be displayed by first displaying the *[path]* followed by a `::` separator followed by the *[identifier]*. +> If the *[identifier]* is empty, then the separating `::` should not be displayed. +> +> If a *[namespace]* is specified, then extra context may be added such as: \ +> *[path]* `::{` *[namespace]* (`:` *[identifier]*)opt `#` *disambiguator*as base-10 number `}` +> +> Here the namespace `C` may be printed as `closure` and `S` as `shim`. +> Others may be printed by their character tag. +> The `:` *name* portion may be skipped if the name is empty. +> +> The *[disambiguator]* in the *[identifier]* may be displayed if a *[namespace]* is specified. +> In other situations, it is usually not necessary to display the *[disambiguator]*. +> If it is displayed, it is recommended to place it in brackets, for example `[284a76a8b41a7fd3]`. +> If the *[disambiguator]* is not present, then its value is 0 and it can always be omitted from display. + +> Example: +> ```rust +> fn main() { +> let x = || {}; +> let y = || {}; +> x(); +> y(); +> } +> ``` +> +> The symbol for the closure `x` in crate `mycrate` is: +> +> ```text +> _RNCNvCsgStHSCytQ6I_7mycrate4main0B3_ +> ││└─────────────┬─────────────┘│ +> ││ │ │ +> ││ │ └── identifier with length 0 +> ││ └───────────────── path to "mycrate::main" +> │└──────────────────────────────── closure namespace +> └───────────────────────────────── nested-path +> ``` +> +> The symbol for the closure `y` is similar, with a disambiguator: +> +> ```text +> _RNCNvCsgStHSCytQ6I_7mycrate4mains_0B3_ +> ││ +> │└── base-62-number 0 +> └─── disambiguator 1 (base-62-number+1) +> ``` +> +> Recommended demangling: +> * `x`: `mycrate::main::{closure#0}` +> * `y`: `mycrate::main::{closure#1}` + +#### Path: Generic arguments +[generic-args]: #path-generic-arguments +[generic-arg]: #path-generic-arguments + +> generic-args → `I` *[path]* {*[generic-arg]*} `E` +> +> generic-arg → \ +>       *[lifetime]* \ +>    | *[type]* \ +>    | `K` *[const]* + +A *generic-args* is a path representing a list of generic arguments. +It consists of the character `I` followed by a *[path]* to the defining entity, followed by zero or more [generic-arg]s terminated by the character `E`. + +Each *[generic-arg]* is either a *[lifetime]* (starting with the character `L`), a *[type]*, or the character `K` followed by a *[const]* representing a const argument. + +> **Recommended Demangling** +> +> A *generic-args* may be printed as: *[path]* `::`opt `<` comma-separated list of args `>` +> The `::` separator may be elided for type paths (similar to Rust's rules). + +> > Example: +> ```rust +> fn main() { +> example([123]); +> } +> +> fn example(x: [T; N]) {} +> ``` +> +> The symbol for the function `example` is: +> +> ```text +> _RINvCsgStHSCytQ6I_7mycrate7examplelKj1_EB2_ +> │└──────────────┬───────────────┘││││││ +> │ │ │││││└── end of generic-args +> │ │ ││││└─── end of const-data +> │ │ │││└──── const value `1` +> │ │ ││└───── const type `usize` +> │ │ │└────── const generic +> │ │ └─────── generic type i32 +> │ └──────────────────────── path to "mycrate::example" +> └──────────────────────────────────────── generic-args +> ``` +> +> Recommended demangling: `mycrate::example::` + +#### Namespace +[namespace]: #namespace + +> namespace → *[lower]* | *[upper]* + +A *namespace* is used to segregate names into separate logical groups, allowing identical names to otherwise avoid collisions. +It consists of a single character of an upper or lowercase ASCII letter. +Lowercase letters are reserved for implementation-internal disambiguation categories (and demanglers should never show them). +Uppercase letters are used for special namespaces which demanglers may display in a special way. + +Uppercase namespaces are: + +* `C` — A closure. +* `S` — A shim. Shims are added by the compiler in some situations where an intermediate is needed. + For example, a `fn()` pointer to a function with the [`#[track_caller]` attribute][reference-track_caller] needs a shim to deal with the implicit caller location. + +> **Recommended Demangling** +> +> See *[nested-path]* for recommended demangling. + +### Identifier +[identifier]: #identifier +[undisambiguated-identifier]: #identifier +[bytes]: #identifier + +> identifier → *[disambiguator]*opt *[undisambiguated-identifier]* +> +> undisambiguated-identifier → `u`opt *[decimal-number]* `_`opt *[bytes]* +> +> bytes → {*UTF-8 bytes*} + +An *identifier* is a named label used in a *[path]* to refer to an entity. +It consists of an optional *[disambiguator]* followed by an *[undisambiguated-identifier]*. + +The disambiguator is used to disambiguate identical identifiers that should not otherwise be considered the same. +For example, closures have no name, so the disambiguator is the only differentiating element between two different closures in the same parent path. + +The undisambiguated-identifier starts with an optional `u` character, +which indicates that the identifier is encoded in [Punycode][Punycode identifiers]. +The next part is a *[decimal-number]* which indicates the length of the *bytes*. + +Following the identifier size is an optional `_` character which is used to separate the length value from the identifier itself. +The `_` is mandatory if the *bytes* starts with a decimal digit or `_` in order to keep it unambiguous where the *decimal-number* ends and the *bytes* starts. + +*bytes* is the identifier itself encoded in UTF-8. + +> **Recommended Demangling** +> +> The display of an *identifier* can depend on its context. +> If it is Punycode-encoded, then it may first be decoded before being displayed. +> +> The *[disambiguator]* may or may not be displayed; see recommendations for rules that use *identifier*. + +#### Punycode identifiers +[Punycode identifiers]: #punycode-identifiers + +Because some environments are restricted to ASCII alphanumerics and `_`, +Rust's [Unicode identifiers][reference-identifiers] may be encoded using a modified version of [Punycode]. + +For example, the function: + +```rust +mod gödel { + mod escher { + fn bach() {} + } +} +``` + +would be mangled as: + +```text +_RNvNtNtCsgOH4LzxkuMq_7mycrateu8gdel_5qa6escher4bach + ││└───┬───┘ + ││ │ + ││ └── gdel_5qa translates to gödel + │└─────── 8 is the length + └──────── `u` indicates it is a Unicode identifier +``` + +Standard Punycode generates strings of the form `([[:ascii:]]+-)?[[:alnum:]]+`. +This is problematic because the `-` character +(which is used to separate the ASCII part from the base-36 encoding) +is not in the supported character set for symbols. +For this reason, `-` characters in the Punycode encoding are replaced with `_`. + +Here are some examples: + +| Original | Punycode | Punycode + Encoding | +|-----------------|-----------------|---------------------| +| føø | f-5gaa | f_5gaa | +| α_ω | _-ylb7e | __ylb7e | +| 铁锈 | n84amf | n84amf | +| 🤦 | fq9h | fq9h | +| ρυστ | 2xaedc | 2xaedc | + +> Note: It is up to the compiler to decide whether or not to encode identifiers using Punycode or not. +> Some platforms may have native support for UTF-8 symbols, +> and the compiler may decide to use the UTF-8 encoding directly. +> Demanglers should be prepared to support either form. + +[Punycode]: https://tools.ietf.org/html/rfc3492 + +### Disambiguator +[disambiguator]: #disambiguator + +> disambiguator → `s` *[base-62-number]* + +A *disambiguator* is used in various parts of a symbol *[path]* to uniquely identify path elements that would otherwise be identical but should not be considered the same. +It starts with the character `s` and is followed by a *[base-62-number]*. + +If the *disambiguator* is not specified, then its value can be assumed to be zero. +Otherwise, when demangling, the value 1 should be added to the *[base-62-number]* +(thus a *base-62-number* of zero encoded as `_` has a value of 1). +This allows disambiguators that are encoded sequentially to use minimal bytes. + +> **Recommended Demangling** +> +> The *disambiguator* may or may not be displayed; see recommendations for rules that use *disambiguator*. + +### Lifetime +[lifetime]: #lifetime + +> lifetime → `L` *[base-62-number]* + +A *lifetime* is used to encode an anonymous (numbered) lifetime, either erased or [higher-ranked](#binder). +It starts with the character `L` and is followed by a *[base-62-number]*. +Index 0 is always erased. +Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime bound by one of the enclosing [binder]s. + +> **Recommended Demangling** +> +> A *lifetime* may be displayed like a Rust lifetime using a single quote. +> Index 0 should be displayed as `'_`. +> +> Lifetimes starting from 1 may be translated to single lowercase letters starting with `'a`. +> Indices over 25 may consider printing the numeric lifetime index as in `_123`. +> +> Index 0 should not be displayed for lifetimes in a *[ref-type]*, *[mut-ref-type]*, or *[dyn-trait-type]*. +> +> Nested binders may consider tracking their indices so that lifetime lettering can start back with `'a` within a nested binder. +> See *[binder]* for more on lifetime indexes and ordering. + +> Example: +> ```rust +> fn main() { +> example::(); +> } +> +> pub fn example() {} +> ``` +> +> The symbol for the function `example` is: +> +> ```text +> _RINvCs7qp2U7fqm6G_7mycrate7exampleFG0_RL1_hRL0_tEuEB2_ +> │└┬┘│└┬┘││└┬┘││ +> │ │ │ │ ││ │ │└── end of input types +> │ │ │ │ ││ │ └─── type u16 +> │ │ │ │ ││ └───── lifetime #1 'b +> │ │ │ │ │└─────── reference type +> │ │ │ │ └──────── type u8 +> │ │ │ └────────── lifetime #2 'a +> │ │ └──────────── reference type +> │ └────────────── binder with 2 lifetimes +> └──────────────── function type +> ``` +> +> Recommended demangling: `mycrate::example:: fn(&'a u8, &'b u16)>` + +### Const +[const]: #const +[const-data]: #const +[hex-digit]: #const + +> const → \ +>       *[type]* *[const-data]* \ +>    | `p` \ +>    | *[backref]* +> +> const-data → `n`opt {*[hex-digit]*} `_` +> +> [hex-digit] → *[digit]* | `a` | `b` | `c` | `d` | `e` | `f` + +A *const* is used to encode a const value used in generics and types. +It has the following forms: + +* A constant value encoded as a *[type]* which represents the type of the constant and *[const-data]* which is the constant value, followed by `_` to terminate the *const*. +* The character `p` which represents a placeholder. +* A *[backref]* to a previously encoded *const* of the same value. + +The encoding of the *const-data* depends on the type: + +* `bool` — The value `false` is encoded as `0_`, the value true is encoded as `1_`. +* `char` — The Unicode scalar value of the character is encoded in hexadecimal. +* Unsigned integers — The value is encoded in hexadecimal. +* Signed integers — The character `n` is a prefix to indicate that it is negative, + followed by the absolute value encoded in hexadecimal. + +> **Recommended Demangling** +> +> A *const* may be displayed by the const value depending on the type. +> +> The `p` placeholder should be displayed as the `_` character. +> +> For specific types: +> * `b` (bool) — Display as `true` or `false`. +> * `c` (char) — Display the character in as a Rust character (such as `'A'` or `'\n'`). +> * integers — Display the integer (either in decimal or hex). + +> Example: +> ```rust +> fn main() { +> example::<0x12345678>(); +> } +> +> pub fn example() {} +> ``` +> +> The symbol for function `example` is: +> +> ```text +> _RINvCs7qp2U7fqm6G_7mycrate7exampleKy12345678_EB2_ +> ││└───┬───┘ +> ││ │ +> ││ └── const-data 0x12345678 +> │└─────── const type u64 +> └──────── const generic arg +> ``` +> +> Recommended demangling: `mycrate::example::<305419896>` + + +### Type +[type]: #type +[basic-type]: #basic-type +[array-type]: #array-type +[slice-type]: #slice-type +[tuple-type]: #tuple-type +[ref-type]: #ref-type +[mut-ref-type]: #mut-ref-type +[const-ptr-type]: #const-ptr-type +[mut-ptr-type]: #mut-ptr-type +[fn-type]: #fn-type +[dyn-trait-type]: #dyn-trait-type + +> type → \ +>       *[basic-type]* \ +>    | *[array-type]* \ +>    | *[slice-type]* \ +>    | *[tuple-type]* \ +>    | *[ref-type]* \ +>    | *[mut-ref-type]* \ +>    | *[const-ptr-type]* \ +>    | *[mut-ptr-type]* \ +>    | *[fn-type]* \ +>    | *[dyn-trait-type]* \ +>    | *[path]* \ +>    | *[backref]* + +A *type* represents a Rust [type][reference-types]. +The initial character can be used to distinguish which type is encoded. +The type encodings based on the initial tag character are: + +* A *basic-type* is encoded as a single character: + * `a` — `i8` + * `b` — `bool` + * `c` — `char` + * `d` — `f64` + * `e` — `str` + * `f` — `f32` + * `h` — `u8` + * `i` — `isize` + * `j` — `usize` + * `l` — `i32` + * `m` — `u32` + * `n` — `i128` + * `o` — `u128` + * `s` — `i16` + * `t` — `u16` + * `u` — unit `()` + * `v` — variadic `...` + * `x` — `i64` + * `y` — `u64` + * `z` — `!` + * `p` — placeholder `_` + +* `A` — An [array][reference-array] `[T; N]`. + + > array-type → `A` *[type]* *[const]* + + The tag `A` is followed by the *[type]* of the array followed by a *[const]* for the array size. + +* `S` — A [slice][reference-slice] `[T]`. + + > slice-type → `S` *[type]* + + The tag `S` is followed by the *[type]* of the slice. + +* `T` — A [tuple][reference-tuple] `(T1, T2, T3, ...)`. + + > tuple-type → `T` {*[type]*} `E` + + The tag `T` is followed by one or more [type]s indicating the type of each field, followed by a terminating `E` character. + + Note that a zero-length tuple (unit) is encoded with the `u` *[basic-type]*. + +* `R` — A [reference][reference-shared-reference] `&T`. + + > ref-type → `R` *[lifetime]*opt *[type]* + + The tag `R` is followed by an optional *[lifetime]* followed by the *[type]* of the reference. + The lifetime is not included if it has been erased. + +* `Q` — A [mutable reference][reference-mutable-reference] `&mut T`. + + > mut-ref-type → `Q` *[lifetime]*opt *[type]* + + The tag `Q` is followed by an optional *[lifetime]* followed by the *[type]* of the mutable reference. + The lifetime is not included if it has been erased. + +* `P` — A [constant raw pointer][reference-raw-pointer] `*const T`. + + The tag `P` is followed by the *[type]* of the pointer. + + > const-ptr-type → `P` *[type]* + +* `O` — A [mutable raw pointer][reference-raw-pointer] `*mut T`. + + > mut-ptr-type → `O` *[type]* + + The tag `O` is followed by the *[type]* of the pointer. + +* `F` — A [function pointer][reference-fn-pointer] `fn(…) -> …`. + + > fn-type → `F` *[fn-sig]* + > + > fn-sig → *[binder]*opt `U`opt (`K` *[abi]*)opt {*[type]*} `E` *[type]* + > + > abi → \ + >       `C` \ + >    | *[undisambiguated-identifier]* + + The tag `F` is followed by a *[fn-sig]* of the function signature. + A *fn-sig* is the signature for a function pointer. + + It starts with an optional *[binder]* which represents the higher-ranked trait bounds (`for<…>`). + + Following that is an optional `U` character which is present for an `unsafe` function. + + Following that is an optional `K` character which indicates that an *[abi]* is specified. + If the ABI is not specified, it is assumed to be the `"Rust"` ABI. + + The *[abi]* can be the letter `C` to indicate it is the `"C"` ABI. + Otherwise it is an *[undisambiguated-identifier]* of the ABI string with dashes converted to underscores. + + Following that is zero or more [type]s which indicate the input parameters of the function. + + Following that is the character `E` and then the *[type]* of the return value. + +[fn-sig]: #fn-sig +[abi]: #abi + +* `D` — A [trait object][reference-trait-object] `dyn Trait + Send + 'a`. + + > dyn-trait-type → `D` *[dyn-bounds]* *[lifetime]* + > + > dyn-bounds → *[binder]*opt {*[dyn-trait]*} `E` + > + > dyn-trait → *[path]* {*[dyn-trait-assoc-binding]*} + > + > dyn-trait-assoc-binding → `p` *[undisambiguated-identifier]* *[type]* + + The tag `D` is followed by a *[dyn-bounds]* which encodes the trait bounds, + followed by a *[lifetime]* of the trait object lifetime bound. + + A *dyn-bounds* starts with an optional *[binder]* which represents the higher-ranked trait bounds (`for<…>`). + Following that is a sequence of *[dyn-trait]* terminated by the character `E`. + + Each *[dyn-trait]* represents a trait bound, which consists of a *[path]* to the trait followed by zero or more *[dyn-trait-assoc-binding]* which list the associated types. + + Each *[dyn-trait-assoc-binding]* consists of a character `p` followed a *[undisambiguated-identifier]* representing the associated binding name, and finally a *[type]*. + +[dyn-bounds]: #dyn-bounds +[dyn-trait]: #dyn-trait +[dyn-trait-assoc-binding]: #dyn-trait-assoc-binding + + +* A *[path]* to a named type. + +* A *[backref]* to refer to a previously encoded type. + +> **Recommended Demangling** +> +> A *[type]* may be displayed as the type it represents, using typical Rust syntax to represent the type. + +> Example: +> ```rust +> fn main() { +> example::<[u16; 8]>(); +> } +> +> pub fn example() {} +> ``` +> +> The symbol for function `example` is: +> +> ```text +> _RINvCs7qp2U7fqm6G_7mycrate7exampleAtj8_EB2_ +> │││├┘│ +> ││││ └─── end of generic args +> │││└───── const data 8 +> ││└────── const type usize +> │└─────── array element type u16 +> └──────── array type +> ``` +> +> Recommended demangling: `mycrate::example::<[u16; 8]>` + +### Binder +[binder]: #binder + +> binder → `G` *[base-62-number]* + +A *binder* represents the number of [higher-ranked trait bound][reference-hrtb] lifetimes to bind. +It consists of the character `G` followed by a *[base-62-number]*. +The value 1 should be added to the *[base-62-number]* when decoding +(such that the *base-62-number* encoding of `_` is interpreted as having 1 binder). + +A *lifetime* rule can then refer to these numbered lifetimes. +The lowest indices represent the innermost lifetimes. +The number of bound lifetimes is the value of *[base-62-number]* plus one. + +For example, in `for<'a, 'b> fn(for<'c> fn (...))`, any [lifetime]s in `...` +(but not inside more binders) will observe the indices 1, 2, and 3 to refer to `'c`, `'b`, and `'a`, respectively. + +> **Recommended Demangling** +> +> A *binder* may be printed using `for<…>` syntax listing the lifetimes as recommended in *[lifetime]*. +> See *[lifetime]* for an example. + +### Backref +[backref]: #backref + +> backref → `B` *[base-62-number]* + +A *backref* is used to refer to a previous part of the mangled symbol. +This provides a simple form of compression to reduce the length of the mangled symbol. +This can help reduce the amount of work and resources needed by the compiler, linker, and loader. + +It consists of the character `B` followed by a *[base-62-number]*. +The number indicates the 0-based offset in bytes starting from just after the `_R` prefix of the symbol. +The *backref* represents the corresponding element starting at that position. + +backrefs always refer to a position before the *backref* itself. + +The *backref* compression relies on the fact that all substitutable symbol elements have a self-terminating mangled form. +Given the start position of the encoded node, the grammar guarantees that it is always unambiguous where the node ends. +This is ensured by not allowing optional or repeating elements at the end of substitutable productions. + +> **Recommended Demangling** +> +> A *backref* should be demangled by rendering the element that it points to. +> Care should be considered when handling deeply nested backrefs to avoid using too much stack. + +> Example: +> ```rust +> fn main() { +> example::(); +> } +> +> struct Example; +> +> pub fn example() {} +> ``` +> +> The symbol for function `example` is: +> +> ```text +> _RINvCs7qp2U7fqm6G_7mycrate7exampleNtB2_7ExampleBw_EB2_ +> │├┘ │├┘ │├┘ +> ││ ││ ││ +> ││ ││ │└── backref to offset 3 (crate-root) +> ││ ││ └─── backref for instantiating-crate path +> ││ │└────── backref to offset 33 (path to Example) +> ││ └─────── backref for second generic-arg +> │└───────────────── backref to offset 3 (crate-root) +> └────────────────── backref for first generic-arg (first segment of Example path) +> ``` +> +> Recommended demangling: `mycrate::example::` + +### Instantiating crate +[instantiating-crate]: #instantiating-crate + +> instantiating-crate → *[path]* + +The *instantiating-crate* is an optional element of the *[symbol-name]* which can be used to indicate which crate is instantiating the symbol. +It consists of a single *[path]*. + +This helps differentiate symbols that would otherwise be identical, +for example the monomorphization of a function from an external crate may result in a duplicate if another crate is also instantiating the same generic function with the same types. + +In practice, the instantiating crate is also the crate where the symbol is defined, +so it is usually encoded as a *[backref]* to the *[crate-root]* encoded elsewhere in the symbol. + +> **Recommended Demangling** +> +> The *instantiating-crate* usually need not be displayed. + +> Example: +> ```rust +> std::path::Path::new("example"); +> ``` +> +> The symbol for `Path::new::` instantiated from the `mycrate` crate is: +> +> ```text +> _RINvMsY_NtCseXNvpPnDBDp_3std4pathNtB6_4Path3neweECs7qp2U7fqm6G_7mycrate +> └──┬───┘ +> │ +> └── instantiating crate identifier `mycrate` +> ``` +> +> Recommended demangling: `::new::` + +### Vendor-specific suffix +[vendor-specific-suffix]: #vendor-specific-suffix +[suffix]: #vendor-specific-suffix + +> vendor-specific-suffix → (`.` | `$`) *[suffix]* +> +> suffix → {*byte*} + +The *vendor-specific-suffix* is an optional element at the end of the *[symbol-name]*. +It consists of either a `.` or `$` character followed by zero or more bytes. +There are no restrictions on the characters following the period or dollar sign. + +This suffix is added as needed by the implementation. +One example where this can happen is when locally unique names need to become globally unique. +LLVM can append a `.llvm.` suffix during LTO to ensure a unique name, +and `$` can be used for thread-local data on Mach-O. +In these situations it's generally fine to ignore the suffix; +the suffixed name has the same semantics as the original. + +> **Recommended Demangling** +> +> The *vendor-specific-suffix* usually need not be displayed. + +> Example: +> ```rust +> # use std::cell::RefCell; +> thread_local! { +> pub static EXAMPLE: RefCell = RefCell::new(1); +> } +> ``` +> +> The symbol for `EXAMPLE` on macOS may have the following for thread-local data: +> +> ```text +> _RNvNvNvCs7qp2U7fqm6G_7mycrate7EXAMPLE7___getit5___KEY$tlv$init +> └───┬───┘ +> │ +> └── vendor-specific-suffix +> ``` +> +> Recommended demangling: `mycrate::EXAMPLE::__getit::__KEY` + +### Common rules +[decimal-number]: #common-rules +[digit]: #common-rules +[lower]: #common-rules +[upper]: #common-rules + +> [decimal-number] → *[digit]* {*[digit]*} +> +> [digit] → `0` | `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` +> +> [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` +> +> [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z` + +A *decimal-number* is encoded as one or more [digit]s indicating a numeric value in decimal. + +The value zero is encoded as a single byte `0`. +Beware that there are situations where `0` may be followed by another digit that should not be decoded as part of the decimal-number. +For example, a zero-length *[identifier]* within a *[nested-path]* which is in turn inside another *[nested-path]* will result in two identifiers in a row, where the first one only has the encoding of `0`. + +A *digit* is an ASCII number. + +A *lower* and *upper* is an ASCII lower and uppercase letter respectively. + +### base-62-number +[base-62-number]: #base-62-number + +> [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` + +A *base-62-number* is an encoding of a 64-bit numeric value. +It uses ASCII numbers and lowercase and uppercase letters. +The value is terminated with the `_` character. +If the value is 0, then the encoding is the `_` character without any digits. +Otherwise, one is subtracted from the value, and it is encoded with the mapping: + +* `0`-`9` maps to 0-9 +* `a`-`z` maps to 10 to 35 +* `A`-`Z` maps to 36 to 61 + +The number is repeatedly divided by 62 (with integer division round towards zero) +to choose the next character in the sequence. +The remainder of each division is used in the mapping to choose the next character. +This is repeated until the number is 0. +The final sequence of characters is then reversed. + +Decoding is a similar process in reverse. + +Examples: + +| Value | Encoding | +|-------|----------| +| 0 | `_` | +| 1 | `0_` | +| 11 | `a_` | +| 62 | `Z_` | +| 63 | `10_` | +| 1000 | `g7_` | + +### Symbol grammar summary +[summary]: #symbol-grammar-summary + +The following is a summary of all of the productions of the symbol grammar. + +> [symbol-name] → `_R` *[decimal-number]*opt *[path]* *[instantiating-crate]*opt *[vendor-specific-suffix]*opt +> +> [path] → \ +>       *[crate-root]* \ +>    | *[inherent-impl]* \ +>    | *[trait-impl]* \ +>    | *[trait-definition]* \ +>    | *[nested-path]* \ +>    | *[generic-args]* \ +>    | *[backref]* +> +> [crate-root] → `C` *[identifier]* \ +> [inherent-impl] → `M` *[impl-path]* *[type]* \ +> [trait-impl] → `X` *[impl-path]* *[type]* *[path]* \ +> [trait-definition] → `Y` *[type]* *[path]* \ +> [nested-path] → `N` *[namespace]* *[path]* *[identifier]* \ +> [generic-args] → `I` *[path]* {*[generic-arg]*} `E` +> +> [identifier] → *[disambiguator]*opt *[undisambiguated-identifier]* \ +> [undisambiguated-identifier] → `u`opt *[decimal-number]* `_`opt *[bytes]* \ +> [bytes] → {*UTF-8 bytes*} +> +> [disambiguator] → `s` *[base-62-number]* +> +> [impl-path] → *[disambiguator]*opt *[path]* +> +> [type] → \ +>       *[basic-type]* \ +>    | *[array-type]* \ +>    | *[slice-type]* \ +>    | *[tuple-type]* \ +>    | *[ref-type]* \ +>    | *[mut-ref-type]* \ +>    | *[const-ptr-type]* \ +>    | *[mut-ptr-type]* \ +>    | *[fn-type]* \ +>    | *[dyn-trait-type]* \ +>    | *[path]* \ +>    | *[backref]* +> +> [basic-type] → *[lower]* \ +> [array-type] → `A` *[type]* *[const]* \ +> [slice-type] → `S` *[type]* \ +> [tuple-type] → `T` {*[type]*} `E` \ +> [ref-type] → `R` *[lifetime]*opt *[type]* \ +> [mut-ref-type] → `Q` *[lifetime]*opt *[type]* \ +> [const-ptr-type] → `P` *[type]* \ +> [mut-ptr-type] → `O` *[type]* \ +> [fn-type] → `F` *[fn-sig]* \ +> [dyn-trait-type] → `D` *[dyn-bounds]* *[lifetime]* +> +> [namespace] → *[lower]* | *[upper]* +> +> [generic-arg] → \ +>       *[lifetime]* \ +>    | *[type]* \ +>    | `K` *[const]* +> +> [lifetime] → `L` *[base-62-number]* +> +> [const] → \ +>       *[type]* *[const-data]* \ +>    | `p` \ +>    | *[backref]* +> +> [const-data] → `n`opt {*[hex-digit]*} `_` +> +> [hex-digit] → *[digit]* | `a` | `b` | `c` | `d` | `e` | `f` +> +> [fn-sig] → *[binder]*opt `U`opt (`K` *[abi]*)opt {*[type]*} `E` *[type]* +> +> [abi] → \ +>       `C` \ +>    | *[undisambiguated-identifier]* +> +> [dyn-bounds] → *[binder]*opt {*[dyn-trait]*} `E` \ +> [dyn-trait] → *[path]* {*[dyn-trait-assoc-binding]*} \ +> [dyn-trait-assoc-binding] → `p` *[undisambiguated-identifier]* *[type]* +> +> [binder] → `G` *[base-62-number]* +> +> [backref] → `B` *[base-62-number]* +> +> [instantiating-crate] → *[path]* +> +> [vendor-specific-suffix] → (`.` | `$`) *[suffix]* \ +> [suffix] → {*byte*} +> +> [decimal-number] → *[digit]* {*[digit]*} +> +> [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` +> +> [digit] → `0` | `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` \ +> [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` \ +> [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z` + +### Encoding of Rust entities + +The following are guidelines for how Rust entities are encoded in a symbol. +The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous. + +* Named functions, methods, and statics shall be represented by a *[path]* production. + +* Paths should be rooted at the inner-most entity that can act as a path root. + Roots can be crate-ids, inherent impls, trait impls, and (for items within default methods) trait definitions. + +* The compiler is free to choose disambiguation indices and namespace tags from + the reserved ranges as long as it ascertains identifier unambiguity. + +* Generic arguments that are equal to the default should not be encoded in order to save space. + + +[RFC 2603]: https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html +[reference-array]: ../../reference/types/array.html +[reference-fn-pointer]: ../../reference/types/function-pointer.html +[reference-hrtb]: ../../reference/trait-bounds.html#higher-ranked-trait-bounds +[reference-identifiers]: ../../reference/identifiers.html +[reference-implementations]: ../../reference/items/implementations.html +[reference-inherent-impl]: ../../reference/items/implementations.html#inherent-implementations +[reference-mutable-reference]: ../../reference/types/pointer.html#mutable-references-mut +[reference-paths]: ../../reference/paths.html +[reference-raw-pointer]: ../../reference/types/pointer.html#raw-pointers-const-and-mut +[reference-shared-reference]: ../../reference/types/pointer.html#shared-references- +[reference-slice]: ../../reference/types/slice.html +[reference-track_caller]: ../../reference/attributes/codegen.html#the-track_caller-attribute +[reference-trait-impl]: ../../reference/items/implementations.html#trait-implementations +[reference-trait-object]: ../../reference/types/trait-object.html +[reference-traits]: ../../reference/items/traits.html +[reference-tuple]: ../../reference/types/tuple.html +[reference-types]: ../../reference/types.html From d5d4619e98a459fa8018c906e9f0c0352231ce17 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 1 Jun 2022 14:01:22 -0700 Subject: [PATCH 02/11] Rearrange symbol-mangling chapter out of codegen-options. --- src/doc/rustc/src/SUMMARY.md | 3 +- src/doc/rustc/src/codegen-options/index.md | 2 +- src/doc/rustc/src/symbol-mangling/index.md | 52 +++++++++ .../v0.md} | 105 +++++------------- 4 files changed, 81 insertions(+), 81 deletions(-) create mode 100644 src/doc/rustc/src/symbol-mangling/index.md rename src/doc/rustc/src/{codegen-options/symbol-mangling.md => symbol-mangling/v0.md} (94%) diff --git a/src/doc/rustc/src/SUMMARY.md b/src/doc/rustc/src/SUMMARY.md index b06f62f89166e..e5ad2f1bf592f 100644 --- a/src/doc/rustc/src/SUMMARY.md +++ b/src/doc/rustc/src/SUMMARY.md @@ -3,7 +3,6 @@ - [What is rustc?](what-is-rustc.md) - [Command-line Arguments](command-line-arguments.md) - [Codegen Options](codegen-options/index.md) - - [Symbol Mangling](codegen-options/symbol-mangling.md) - [Lints](lints/index.md) - [Lint Levels](lints/levels.md) - [Lint Groups](lints/groups.md) @@ -53,4 +52,6 @@ - [Instrumentation-based Code Coverage](instrument-coverage.md) - [Linker-plugin-based LTO](linker-plugin-lto.md) - [Exploit Mitigations](exploit-mitigations.md) +- [Symbol Mangling](symbol-mangling/index.md) + - [v0 Symbol Format](symbol-mangling/v0.md) - [Contributing to `rustc`](contributing.md) diff --git a/src/doc/rustc/src/codegen-options/index.md b/src/doc/rustc/src/codegen-options/index.md index f77851cdec2de..38666ba726bd7 100644 --- a/src/doc/rustc/src/codegen-options/index.md +++ b/src/doc/rustc/src/codegen-options/index.md @@ -577,7 +577,7 @@ change in the future. See the [Symbol Mangling] chapter for details on symbol mangling and the mangling format. [name mangling]: https://en.wikipedia.org/wiki/Name_mangling -[Symbol Mangling]: symbol-mangling.md +[Symbol Mangling]: ../symbol-mangling/index.md ## target-cpu diff --git a/src/doc/rustc/src/symbol-mangling/index.md b/src/doc/rustc/src/symbol-mangling/index.md new file mode 100644 index 0000000000000..be58f2b41b8dd --- /dev/null +++ b/src/doc/rustc/src/symbol-mangling/index.md @@ -0,0 +1,52 @@ +# Symbol Mangling + +[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation. +The encoded names are used by the linker to associate the name with the thing it refers to. + +The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option. + +[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling +[`-C symbol-mangling-version`]: ../codegen-options/index.md#symbol-mangling-version + +## Per-item control + +The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item. + +The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static. + +Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item. +The [`#[link_name]` attribute][reference-link_name] can be used to change that name. + + + +[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute +[reference-export_name]: ../../reference/abi.html#the-export_name-attribute +[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute +[reference-extern-block]: ../../reference/items/external-blocks.html + +## Decoding + +The encoded names may need to be decoded in some situations. +For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user. +Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers. +In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names. +[`rustfilt`] is a CLI tool which can demangle names. + +An example of running rustfilt: + +```text +$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function +foo::example_function +``` + +[`rustc-demangle`]: https://crates.io/crates/rustc-demangle +[`rustfilt`]: https://crates.io/crates/rustfilt + +## Mangling versions + +`rustc` supports different mangling versions which encode the names in different ways. +The legacy version (which is currently the default) is not described here. +The "v0" mangling scheme addresses several limitations of the legacy format, +and is described in the [v0 Symbol Format](v0.md) chapter. diff --git a/src/doc/rustc/src/codegen-options/symbol-mangling.md b/src/doc/rustc/src/symbol-mangling/v0.md similarity index 94% rename from src/doc/rustc/src/codegen-options/symbol-mangling.md rename to src/doc/rustc/src/symbol-mangling/v0.md index 1f7c4b805e3d6..408e6b1244a35 100644 --- a/src/doc/rustc/src/codegen-options/symbol-mangling.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -1,57 +1,4 @@ -# Symbol Mangling - -[Symbol name mangling] is used by `rustc` to encode a unique name for symbols that are used during code generation. -The encoded names are used by the linker to associate the name with the thing it refers to. - -The method for mangling the names can be controlled with the [`-C symbol-mangling-version`] option. - -[Symbol name mangling]: https://en.wikipedia.org/wiki/Name_mangling -[`-C symbol-mangling-version`]: index.md#symbol-mangling-version - -## Per-item control - -The [`#[no_mangle]` attribute][reference-no_mangle] can be used on items to disable name mangling on that item. - -The [`#[export_name]`attribute][reference-export_name] can be used to specify the exact name that will be used for a function or static. - -Items listed in an [`extern` block][reference-extern-block] use the identifier of the item without mangling to refer to the item. -The [`#[link_name]` attribute][reference-link_name] can be used to change that name. - - - -[reference-no_mangle]: ../../reference/abi.html#the-no_mangle-attribute -[reference-export_name]: ../../reference/abi.html#the-export_name-attribute -[reference-link_name]: ../../reference/items/external-blocks.html#the-link_name-attribute -[reference-extern-block]: ../../reference/items/external-blocks.html - -## Decoding - -The encoded names may need to be decoded in some situations. -For example, debuggers and other tooling may need to demangle the name so that it is more readable to the user. -Recent versions of `gdb` and `lldb` have built-in support for demangling Rust identifiers. -In situations where you need to do your own demangling, the [`rustc-demangle`] crate can be used to programmatically demangle names. -[`rustfilt`] is a CLI tool which can demangle names. - -An example of running rustfilt: - -```text -$ rustfilt _RNvCskwGfYPst2Cb_3foo16example_function -foo::example_function -``` - -[`rustc-demangle`]: https://crates.io/crates/rustc-demangle -[`rustfilt`]: https://crates.io/crates/rustfilt - -## Mangling versions - -`rustc` supports different mangling versions which encode the names in different ways. -The legacy version (which is currently the default) is not described here. -The "v0" mangling scheme addresses several limitations of the legacy format, -and is [described below](#v0-mangling-format). - -## v0 mangling format +# v0 Symbol Format The v0 mangling format was introduced in [RFC 2603]. It has the following properties: @@ -78,7 +25,7 @@ There is no standardized demangled form of the symbols, though suggestions are provided for how to demangle a symbol. Implementers may choose to demangle in different ways. -### Grammar notation +## Grammar notation The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. A consolidated summary can be found in the [Symbol grammar summary][summary]. @@ -93,7 +40,7 @@ A consolidated summary can be found in the [Symbol grammar summary][summary]. | Option | opt | A → *B*opt *C* | An optional element. | | Literal | `monospace` | A → `G` | A terminal matching the exact characters case-sensitive. | -### Symbol name +## Symbol name [symbol-name]: #symbol-name > symbol-name → `_R` *[decimal-number]*opt *[path]* *[instantiating-crate]*opt *[vendor-specific-suffix]*opt @@ -128,7 +75,7 @@ The final part is an optional *[vendor-specific-suffix]*. > > Recommended demangling: `::new` -### Symbol path +## Symbol path [path]: #symbol-path > path → \ @@ -156,7 +103,7 @@ The initial tag character can be used to determine which kind of path it represe | `I` | *[generic-args]* | Generic arguments. | | `B` | *[backref]* | A back reference. | -#### Path: Crate root +### Path: Crate root [crate-root]: #path-crate-root > crate-root → `C` *[identifier]* @@ -196,7 +143,7 @@ the *[disambiguator]* is used to make the name unique across the crate graph. > > Recommended demangling: `mycrate::example` -#### Path: Inherent impl +### Path: Inherent impl [inherent-impl]: #path-inherent-impl > inherent-impl → `M` *[impl-path]* *[type]* @@ -230,7 +177,7 @@ It consists of the character `M` followed by an *[impl-path]* to the impl's pare > > Recommended demangling: `::foo` -#### Path: Trait impl +### Path: Trait impl [trait-impl]: #path-trait-impl > trait-impl → `X` *[impl-path]* *[type]* *[path]* @@ -268,7 +215,7 @@ It consists of the character `X` followed by an *[impl-path]* to the impl's pare > > Recommended demangling: `::foo` -#### Path: Impl +### Path: Impl [impl-path]: #path-impl > impl-path → *[disambiguator]*opt *[path]* @@ -316,7 +263,7 @@ The *[disambiguator]* can be used to distinguish between multiple impls within t > * `foo`: `::foo` > * `bar`: `::bar` -#### Path: Trait definition +### Path: Trait definition [trait-definition]: #path-trait-definition > trait-definition → `Y` *[type]* *[path]* @@ -350,7 +297,7 @@ It consists of the character `Y` followed by the *[type]* which is the `Self` ty > > Recommended demangling: `::example` -#### Path: Nested path +### Path: Nested path [nested-path]: #path-nested-path > nested-path → `N` *[namespace]* *[path]* *[identifier]* @@ -415,7 +362,7 @@ For example, entities like closures, tuple-like struct constructors, and anonymo > * `x`: `mycrate::main::{closure#0}` > * `y`: `mycrate::main::{closure#1}` -#### Path: Generic arguments +### Path: Generic arguments [generic-args]: #path-generic-arguments [generic-arg]: #path-generic-arguments @@ -462,7 +409,7 @@ Each *[generic-arg]* is either a *[lifetime]* (starting with the character `L`), > > Recommended demangling: `mycrate::example::` -#### Namespace +### Namespace [namespace]: #namespace > namespace → *[lower]* | *[upper]* @@ -482,7 +429,7 @@ Uppercase namespaces are: > > See *[nested-path]* for recommended demangling. -### Identifier +## Identifier [identifier]: #identifier [undisambiguated-identifier]: #identifier [bytes]: #identifier @@ -515,7 +462,7 @@ The `_` is mandatory if the *bytes* starts with a decimal digit or `_` in order > > The *[disambiguator]* may or may not be displayed; see recommendations for rules that use *identifier*. -#### Punycode identifiers +### Punycode identifiers [Punycode identifiers]: #punycode-identifiers Because some environments are restricted to ASCII alphanumerics and `_`, @@ -565,7 +512,7 @@ Here are some examples: [Punycode]: https://tools.ietf.org/html/rfc3492 -### Disambiguator +## Disambiguator [disambiguator]: #disambiguator > disambiguator → `s` *[base-62-number]* @@ -582,7 +529,7 @@ This allows disambiguators that are encoded sequentially to use minimal bytes. > > The *disambiguator* may or may not be displayed; see recommendations for rules that use *disambiguator*. -### Lifetime +## Lifetime [lifetime]: #lifetime > lifetime → `L` *[base-62-number]* @@ -632,7 +579,7 @@ Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime > > Recommended demangling: `mycrate::example:: fn(&'a u8, &'b u16)>` -### Const +## Const [const]: #const [const-data]: #const [hex-digit]: #const @@ -695,7 +642,7 @@ The encoding of the *const-data* depends on the type: > Recommended demangling: `mycrate::example::<305419896>` -### Type +## Type [type]: #type [basic-type]: #basic-type [array-type]: #array-type @@ -881,7 +828,7 @@ The type encodings based on the initial tag character are: > > Recommended demangling: `mycrate::example::<[u16; 8]>` -### Binder +## Binder [binder]: #binder > binder → `G` *[base-62-number]* @@ -903,7 +850,7 @@ For example, in `for<'a, 'b> fn(for<'c> fn (...))`, any [lifetime]s in > A *binder* may be printed using `for<…>` syntax listing the lifetimes as recommended in *[lifetime]*. > See *[lifetime]* for an example. -### Backref +## Backref [backref]: #backref > backref → `B` *[base-62-number]* @@ -954,7 +901,7 @@ This is ensured by not allowing optional or repeating elements at the end of sub > > Recommended demangling: `mycrate::example::` -### Instantiating crate +## Instantiating crate [instantiating-crate]: #instantiating-crate > instantiating-crate → *[path]* @@ -988,7 +935,7 @@ so it is usually encoded as a *[backref]* to the *[crate-root]* encoded elsewher > > Recommended demangling: `::new::` -### Vendor-specific suffix +## Vendor-specific suffix [vendor-specific-suffix]: #vendor-specific-suffix [suffix]: #vendor-specific-suffix @@ -1030,7 +977,7 @@ the suffixed name has the same semantics as the original. > > Recommended demangling: `mycrate::EXAMPLE::__getit::__KEY` -### Common rules +## Common rules [decimal-number]: #common-rules [digit]: #common-rules [lower]: #common-rules @@ -1054,7 +1001,7 @@ A *digit* is an ASCII number. A *lower* and *upper* is an ASCII lower and uppercase letter respectively. -### base-62-number +## base-62-number [base-62-number]: #base-62-number > [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` @@ -1088,7 +1035,7 @@ Examples: | 63 | `10_` | | 1000 | `g7_` | -### Symbol grammar summary +## Symbol grammar summary [summary]: #symbol-grammar-summary The following is a summary of all of the productions of the symbol grammar. @@ -1189,7 +1136,7 @@ The following is a summary of all of the productions of the symbol grammar. > [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` \ > [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z` -### Encoding of Rust entities +## Encoding of Rust entities The following are guidelines for how Rust entities are encoded in a symbol. The compiler has some latitude in how an entity is encoded as long as the symbol is unambiguous. From ddd26b46cdd7d37e88b7908d956a85bd97f1a744 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 1 Jun 2022 14:03:41 -0700 Subject: [PATCH 03/11] Remove 64-bit limit for base-62-numbers. Demanglers should be prepared for any arbitrary length number. --- src/doc/rustc/src/symbol-mangling/v0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 408e6b1244a35..1d7992e077d98 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -1006,7 +1006,7 @@ A *lower* and *upper* is an ASCII lower and uppercase letter respectively. > [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` -A *base-62-number* is an encoding of a 64-bit numeric value. +A *base-62-number* is an encoding of a numeric value. It uses ASCII numbers and lowercase and uppercase letters. The value is terminated with the `_` character. If the value is 0, then the encoding is the `_` character without any digits. From d782e8748b441335c4bed80ae791a60b770658f2 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Thu, 9 Jun 2022 19:09:47 -0700 Subject: [PATCH 04/11] Update from review from michaelwoerister. --- src/doc/rustc/src/symbol-mangling/v0.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 1d7992e077d98..797491a4ab309 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -149,7 +149,8 @@ the *[disambiguator]* is used to make the name unique across the crate graph. > inherent-impl → `M` *[impl-path]* *[type]* An *inherent-impl* indicates a path to an [inherent implementation][reference-inherent-impl]. -It consists of the character `M` followed by an *[impl-path]* to the impl's parent followed by the *[type]* representing the `Self` type of the impl. +It consists of the character `M` followed by an *[impl-path]*, which uniquely identifies the impl block the item is defined in. +Following that is a *[type]* representing the `Self` type of the impl. > **Recommended Demangling** > @@ -167,12 +168,13 @@ It consists of the character `M` followed by an *[impl-path]* to the impl's pare > The symbol for `foo` in the impl for `Example` is: > > ```text -> _RNvMCs15kBYyAo9fc_7mycrateNtB2_7Example3foo -> │└─────────┬──────────┘└────┬──────┘ -> │ │ │ -> │ │ └── Self type "Example" -> │ └─────────────────── path to the impl's parent "mycrate" -> └────────────────────────────── inherent-impl +> _RNvMs_Cs4Cv8Wi1oAIB_7mycrateNtB4_7Example3foo +> │├┘└─────────┬──────────┘└────┬──────┘ +> ││ │ │ +> ││ │ └── Self type "Example" +> ││ └─────────────────── path to the impl's parent "mycrate" +> │└─────────────────────────────── disambiguator 1 +> └──────────────────────────────── inherent-impl > ``` > > Recommended demangling: `::foo` @@ -307,8 +309,9 @@ It consists of the character `N` followed by a *[namespace]* indicating the name followed by a *[path]* which is a path representing the parent of the entity, followed by an *[identifier]* of the entity. -The identifier of the entity may be empty when the entity is not named. +The identifier of the entity may have a length of 0 when the entity is not named. For example, entities like closures, tuple-like struct constructors, and anonymous constants may not have a name. +The identifier may still have a disambiguator unless the disambiguator is 0. > **Recommended Demangling** > @@ -912,7 +915,7 @@ It consists of a single *[path]*. This helps differentiate symbols that would otherwise be identical, for example the monomorphization of a function from an external crate may result in a duplicate if another crate is also instantiating the same generic function with the same types. -In practice, the instantiating crate is also the crate where the symbol is defined, +In practice, the instantiating crate is also often the crate where the symbol is defined, so it is usually encoded as a *[backref]* to the *[crate-root]* encoded elsewhere in the symbol. > **Recommended Demangling** From 769f938aea2c0d270cafc8f66c283d98461f701a Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Fri, 10 Jun 2022 11:15:13 -0700 Subject: [PATCH 05/11] Clarify missing tick. --- src/doc/rustc/src/symbol-mangling/v0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 797491a4ab309..dfd53434251ce 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -548,7 +548,7 @@ Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime > Index 0 should be displayed as `'_`. > > Lifetimes starting from 1 may be translated to single lowercase letters starting with `'a`. -> Indices over 25 may consider printing the numeric lifetime index as in `_123`. +> Indices over 25 may consider printing the numeric lifetime index as in `'_123`. > > Index 0 should not be displayed for lifetimes in a *[ref-type]*, *[mut-ref-type]*, or *[dyn-trait-type]*. > From 9e0c19d5c20d2f5e2323e1b9f0fadd3276df4daf Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Fri, 10 Jun 2022 11:16:48 -0700 Subject: [PATCH 06/11] Clarify grammar for decimal-number cannot have leading zeroes. --- src/doc/rustc/src/symbol-mangling/v0.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index dfd53434251ce..fbb813a339a86 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -983,12 +983,16 @@ the suffixed name has the same semantics as the original. ## Common rules [decimal-number]: #common-rules [digit]: #common-rules +[non-zero-digit]: #common-rules [lower]: #common-rules [upper]: #common-rules -> [decimal-number] → *[digit]* {*[digit]*} +> [decimal-number] → \ +>       `0` \ +>    | *[non-zero-digit]* {*[digit]*} > -> [digit] → `0` | `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` +> [non-zero-digit] → `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` \ +> [digit] → `0` | *[non-zero-digit]* > > [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` > @@ -1131,11 +1135,14 @@ The following is a summary of all of the productions of the symbol grammar. > [vendor-specific-suffix] → (`.` | `$`) *[suffix]* \ > [suffix] → {*byte*} > -> [decimal-number] → *[digit]* {*[digit]*} +> [decimal-number] → \ +>       `0` \ +>    | *[non-zero-digit]* {*[digit]*} > > [base-62-number] → { *[digit]* | *[lower]* | *[upper]* } `_` > -> [digit] → `0` | `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` \ +> [non-zero-digit] → `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9` \ +> [digit] → `0` | *[non-zero-digit]* \ > [lower] → `a` |`b` |`c` |`d` |`e` |`f` |`g` |`h` |`i` |`j` |`k` |`l` |`m` |`n` |`o` |`p` |`q` |`r` |`s` |`t` |`u` |`v` |`w` |`x` |`y` |`z` \ > [upper] → `A` | `B` | `C` | `D` | `E` | `F` | `G` | `H` | `I` | `J` | `K` | `L` | `M` | `N` | `O` | `P` | `Q` | `R` | `S` | `T` | `U` | `V` | `W` | `X` | `Y` | `Z` From 0e66b0024cbdbc30e97c226292461e1b608a424a Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Fri, 10 Jun 2022 11:28:30 -0700 Subject: [PATCH 07/11] Rewrite recommended demangling for lifetimes using "De Bruijn level". --- src/doc/rustc/src/symbol-mangling/v0.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index fbb813a339a86..2bd17cd28fedd 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -545,14 +545,14 @@ Indices starting from 1 refer (as de Bruijn indices) to a higher-ranked lifetime > **Recommended Demangling** > > A *lifetime* may be displayed like a Rust lifetime using a single quote. -> Index 0 should be displayed as `'_`. -> -> Lifetimes starting from 1 may be translated to single lowercase letters starting with `'a`. -> Indices over 25 may consider printing the numeric lifetime index as in `'_123`. > +> Index 0 should be displayed as `'_`. > Index 0 should not be displayed for lifetimes in a *[ref-type]*, *[mut-ref-type]*, or *[dyn-trait-type]*. > -> Nested binders may consider tracking their indices so that lifetime lettering can start back with `'a` within a nested binder. +> A lifetime can be displayed by converting the De Bruijn index to a De Bruijn level +> (level = number of bound lifetimes - index) and selecting a unique name for each level. +> For example, starting with single lowercase letters such as `'a` for level 0. +> Levels over 25 may consider printing the numeric lifetime as in `'_123`. > See *[binder]* for more on lifetime indexes and ordering. > Example: From b2d401f6a5959a5047176fa8398e5f27a363c7d9 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Sat, 13 May 2023 19:02:30 -0700 Subject: [PATCH 08/11] Add an example of placeholders. --- src/doc/rustc/src/symbol-mangling/v0.md | 36 +++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 2bd17cd28fedd..4f130191bfa75 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -600,7 +600,7 @@ A *const* is used to encode a const value used in generics and types. It has the following forms: * A constant value encoded as a *[type]* which represents the type of the constant and *[const-data]* which is the constant value, followed by `_` to terminate the *const*. -* The character `p` which represents a placeholder. +* The character `p` which represents a [placeholder]. * A *[backref]* to a previously encoded *const* of the same value. The encoding of the *const-data* depends on the type: @@ -644,6 +644,38 @@ The encoding of the *const-data* depends on the type: > > Recommended demangling: `mycrate::example::<305419896>` +### Placeholders +[placeholder]: #placeholders + +A *placeholder* may occur in circumstances where a type or const value is not relevant. + +> Example: +> ```rust +> pub struct Example([T; N]); +> +> impl Example { +> pub fn foo() -> &'static () { +> static EXAMPLE_STATIC: () = (); +> &EXAMPLE_STATIC +> } +> } +> ``` +> +> In this example, the static `EXAMPLE_STATIC` would not be monomorphized by the type or const parameters `T` and `N`. +> Those will use the placeholder for those generic arguments. +> Its symbol is: +> +> ```text +> _RNvNvMCsd9PVOYlP1UU_7mycrateINtB4_7ExamplepKpE3foo14EXAMPLE_STATIC +> │ │││ +> │ ││└── const placeholder +> │ │└─── const generic argument +> │ └──── type placeholder +> └────────────────── generic-args +> ``` +> +> Recommended demangling: `>::foo::EXAMPLE_STATIC` + ## Type [type]: #type @@ -697,7 +729,7 @@ The type encodings based on the initial tag character are: * `x` — `i64` * `y` — `u64` * `z` — `!` - * `p` — placeholder `_` + * `p` — [placeholder] `_` * `A` — An [array][reference-array] `[T; N]`. From ceda03d3c7a17708f69ef2a91b369e08c1139a3f Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Sat, 13 May 2023 19:48:45 -0700 Subject: [PATCH 09/11] Add missing word "the". --- src/doc/rustc/src/symbol-mangling/v0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 4f130191bfa75..50075e729634f 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -5,7 +5,7 @@ It has the following properties: - It provides an unambiguous string encoding for everything that can end up in a binary's symbol table. - It encodes information about generic parameters in a reversible way. -- The mangled symbols are *decodable* such that demangled form should be easily identifiable as some concrete instance of e.g. a polymorphic function. +- The mangled symbols are *decodable* such that the demangled form should be easily identifiable as some concrete instance of e.g. a polymorphic function. - It has a consistent definition that does not rely on pretty-printing certain language constructs. - Symbols can be restricted to only consist of the characters `A-Z`, `a-z`, `0-9`, and `_`. This helps ensure that it is platform-independent, From d376e63384d7b0ad467bb0464eaf15f737f72666 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 5 Jun 2023 11:55:16 -0700 Subject: [PATCH 10/11] Add description of forwards-compatible behavior. --- src/doc/rustc/src/symbol-mangling/v0.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 50075e729634f..13f7be315c959 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -25,6 +25,14 @@ There is no standardized demangled form of the symbols, though suggestions are provided for how to demangle a symbol. Implementers may choose to demangle in different ways. +## Extensions + +This format may be extended in the future to add new tags as Rust is extended with new language items. +To be forward compatible, demanglers should gracefully handle symbols that have encodings where it encounters a tag character not described in this document. +For example, they may fall back to displaying the mangled symbol. +The format may be extended anywhere there is a tag character, such as the [type] rule. +The meaning of existing tags and encodings will not be changed. + ## Grammar notation The format of an encoded symbol is illustrated as a context free grammar in an extended BNF-like syntax. From da4f62ebf7c487700c21a713c0cc3974bef11fc8 Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Mon, 24 Jul 2023 08:15:53 -0700 Subject: [PATCH 11/11] Fix span for punnycode --- src/doc/rustc/src/symbol-mangling/v0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/doc/rustc/src/symbol-mangling/v0.md b/src/doc/rustc/src/symbol-mangling/v0.md index 13f7be315c959..61f747fac837c 100644 --- a/src/doc/rustc/src/symbol-mangling/v0.md +++ b/src/doc/rustc/src/symbol-mangling/v0.md @@ -493,7 +493,7 @@ would be mangled as: ```text _RNvNtNtCsgOH4LzxkuMq_7mycrateu8gdel_5qa6escher4bach - ││└───┬───┘ + ││└───┬──┘ ││ │ ││ └── gdel_5qa translates to gödel │└─────── 8 is the length