Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undeclared identifiers in tokens assignment? #40

Open
philregier opened this issue Sep 12, 2019 · 5 comments
Open

Undeclared identifiers in tokens assignment? #40

philregier opened this issue Sep 12, 2019 · 5 comments

Comments

@philregier
Copy link

Please pardon my ignorance, but I'm having a very difficult time understanding what happens when I assign the tokens attribute of a Lexer instance.

I see in the documentation

Token names should be specified using all-caps as shown.

and some testing confirms that any identifier seems to be legal so long as its name is capitalized, but I don't understand what happens in Python when these identifiers are provided; nor do I recognize anything in lex.py that enables this behavior.

What is so special about tokens that allows it to accept previously undeclared names?

@dabeaz
Copy link
Owner

dabeaz commented Sep 12, 2019

Many magical behaviors can be achieved through the questionable use of metaclasses.

@philregier
Copy link
Author

So if the mechanism itself requires a level of understanding of the Python data model beyond what I would be able to muster, is it reasonable (if imprecise) to say that where the derived lexer is concerned, the required tokens attribute -- whatever magic may be used to assemble it -- is there to identify which attributes get used to define specific lexer characteristics as the class is built?

For example, the ID attribute of CalcLexer in the introductory example is "special" by virtue of the fact that it first appeared in the tokens attribute when the derived class was first prepared? And if I were to add another attribute, say AMPERSAND = r'&', it would not be special because its name did not appear in tokens?

@dabeaz
Copy link
Owner

dabeaz commented Sep 13, 2019

The primary purpose of tokens is to precisely specify the set of terminals needed for constructing parsers. If you were to add an attribute not listed in tokens, there would be no way for the parser to know about it. As for the underlying magic, it's not connected to tokens so much as the entire enclosing environment defined when you inherit from the Lexer base class.

@EkremDincel
Copy link

@dabeaz may you explain how this works a bit more please? I read the sly.lex module but didn't understand what section makes this magic, I know metaclasses though.

@dabeaz
Copy link
Owner

dabeaz commented Jun 29, 2020

The mechanism is metaclasses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants