-
Notifications
You must be signed in to change notification settings - Fork 544
Start macro expansion chapter #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
1627505
4992b47
ba3dd18
858dfdf
dee42c1
82da67a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,154 @@ | ||
# Macro expansion | ||
|
||
Macro expansion happens during parsing. `rustc` has two parsers, in fact: the | ||
normal Rust parser, and the macro parser. During the parsing phase, the normal | ||
Rust parser will call into the macro parser when it encounters a macro | ||
definition or macro invocation (TODO: verify). The macro parser, in turn, may | ||
call back out to the Rust parser when it needs to bind a metavariable (e.g. | ||
`$my_expr`) while parsing the contents of a macro invocation. The code for macro | ||
expansion is in [`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to | ||
explain how macro expansion works. | ||
|
||
### Example | ||
|
||
It's helpful to have an example to refer to. For the remainder of this chapter, | ||
whenever we refer to the "example _definition_", we mean the following: | ||
|
||
```rust | ||
macro_rules! printer { | ||
(print $mvar:ident) => { | ||
println!("{}", $mvar); | ||
} | ||
(print twice $mvar:ident) => { | ||
println!("{}", $mvar); | ||
println!("{}", $mvar); | ||
} | ||
} | ||
``` | ||
|
||
`$mvar` is called a _metavariable_. Unlike normal variables, rather than binding | ||
to a value in a computation, a metavariable binds _at compile time_ to a tree of | ||
_tokens_. A _token_ zero or more symbols that together have some meaning. For | ||
example, in our example definition, `print`, `$mvar`, `=>`, `{` are all tokens | ||
(though that's not an exhaustive list). There are also other special tokens, | ||
such as `EOF`, which indicates that there are no more tokens. The process of | ||
producing a stream of tokens from the raw bytes of the source file is called | ||
_lexing_. For more information about _lexing_, see the [Parsing | ||
chapter][parsing] of this book. | ||
|
||
Whenever we refer to the "example _invocation_", we mean the following snippet: | ||
|
||
```rust | ||
printer!(print foo); // Assume `foo` is a variable defined somewhere else... | ||
``` | ||
|
||
The process of expanding the macro invocation into the syntax tree | ||
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is | ||
called _macro expansion_, it is the topic of this chapter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: the word |
||
|
||
### The macro parser | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as a meta-comment, I think it's a good idea to start out with some kind of concrete example and walk it through. For example: Imagine we have a macro macro_rules! foo {
($metavariable:tt) => { ... }
} now you can reference this example from the text below |
||
|
||
There are two parts to macro expansion: parsing the definition and parsing the | ||
invocations. Interestingly, both are done by the macro parser. | ||
|
||
Basically, the macro parser is like an NFA-based regex parser. It uses an | ||
algorithm similar in spirit to the [Earley parsing | ||
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is | ||
defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. | ||
|
||
The interface of the macro parser is as follows (this is slightly simplified): | ||
|
||
```rust | ||
fn parse( | ||
sess: ParserSession, | ||
tts: TokenStream, | ||
ms: &[TokenTree] | ||
) -> NamedParseResult | ||
``` | ||
|
||
In this interface: | ||
|
||
- `sess` is a "parsing session", which keeps track of some metadata. Most | ||
notably, this is used to keep track of errors that are generated so they can | ||
be reported to the user. | ||
- `tts` is a stream of tokens. The macro parser's job is to consume the raw | ||
stream of tokens and output a binding of metavariables to corresponding token | ||
trees. | ||
- `ms` a _matcher_. This is a sequence of token trees that we want to match | ||
`tts` against. | ||
|
||
In the analogy of a regex parser, `tts` is the input and we are matching it | ||
against the pattern `ms`. Using our examples, `tts` could be the stream of | ||
tokens containing the inside of the example invocation `print foo`, while `ms` | ||
might be the sequence of token (trees) `print $mvar:ident`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tying back to the example is 💯 |
||
|
||
The output of the parser is a `NamedParserResult`, which indicates which of | ||
three cases has occured: | ||
|
||
- Success: `tts` matches the given matcher `ms`, and we have produced a binding | ||
from metavariables to the corresponding token trees. | ||
- Failure: `tts` does not match `ms`. This results in an error message such as | ||
"No rule expected token _blah_". | ||
- Error: some fatal error has occured _in the parser_. For example, this happens | ||
if there are more than one pattern match, since that indicates the macro is | ||
ambiguous. | ||
|
||
The full interface is defined [here][code_parse_int]. | ||
|
||
The macro parser does pretty much exactly the same as a normal regex parser with | ||
one exception: in order to parse different types of metavariables, such as | ||
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the | ||
normal Rust parser. | ||
|
||
As mentioned above, both definitions and invocations of macros are parsed using | ||
the macro parser. This is extremely non-intuitive and self-referential. The code | ||
to parse macro _definitions_ is in | ||
[`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for | ||
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, | ||
a `macro_rules` defintion should have in its body at least one occurence of a | ||
token tree followed by `=>` followed by another token tree. When the compiler | ||
comes to a `macro_rules` definition, it uses this pattern to match the two token | ||
trees per rule in the definition of the macro _using the macro parser itself_. | ||
In our example definition, the metavariable `$lhs` would match the patterns of | ||
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` | ||
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ | ||
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this | ||
knowledge around for when it needs to expand a macro invocation. | ||
|
||
When the compiler comes to a macro invocation, it parses that invocation using | ||
the same NFA-based macro parser that is described above. However, the matcher | ||
used is the first token tree (`$lhs`) extracted from the arms of the macro | ||
_definition_. Using our example, we would try to match the token stream `print | ||
foo` from the invocation against the matchers `print $mvar:ident` and `print | ||
twice $mvar:ident` that we previously extracted from the definition. The | ||
algorithm is exactly the same, but when the macro parser comes to a place in the | ||
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), | ||
it calls back to the normal Rust parser to get the contents of that | ||
non-terminal. In this case, the Rust parser would look for an `ident` token, | ||
which it finds (`foo`) and returns to the macro parser. Then, the macro parser | ||
proceeds in parsing as normal. Also, note that exactly one of the matchers from | ||
the various arms should match the invocation (otherwise, the macro is | ||
ambiguous). | ||
|
||
For more information about the macro parser's implementation, see the comments | ||
in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp]. | ||
|
||
### Hygiene | ||
|
||
TODO | ||
|
||
### Procedural Macros | ||
|
||
TODO | ||
|
||
### Custom Derive | ||
|
||
TODO | ||
|
||
|
||
|
||
[code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt | ||
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs | ||
[code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs | ||
[code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421 | ||
[parsing]: ./the-parser.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is not grammatical and I'm not quite sure how to fix it. =) In particular, I don't think of a token as "zero or more symbols" (and it's sort of unclear to me what you mean by symbol, which in parsing terminology is often used to mean the union of token and nonterminal).
I think I would maybe say something like this:
"A token is a single "unit" of the grammar, such as an identifier (e.g.,
print
) or punctuation (e.g.,=>
). Token trees resulting from paired parentheses-like characters ((...)
,[...]
, and{...}
) -- they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced)."but it doesn't seem like the best either :)