Skip to content

Commit

Permalink
ng code generation (#332)
Browse files Browse the repository at this point in the history
* [walkrers] make walk_children public
* [grammar] fix name to TatSu
* [mixins][indent] allow control over the amount of indentation
* [ngcodegen] allow naming the parser
* [mixins][indent] allways trim left spacing in arguments
* [ngcodegen][model] bootstrap model generation
* [ngcodegen][objectmodel] add model generator
* [mixins][indent] honor print() kwargs
* [tool] arg documentation
* [docs] deprecate declarative translation
* [ngcodegen][model] use topological sort for order of model classes
* [util][misc] document topsort
* [ngcodegen][model] do not generate model classes for builtins
* [bootstrap] make the generated parser be the bootstrap parser
* [ngcodegen][model] refactor and optimize
* [dist] bump up version number
  • Loading branch information
apalala authored Dec 11, 2023
1 parent c8b3ae9 commit 4181d08
Show file tree
Hide file tree
Showing 30 changed files with 898 additions and 548 deletions.
4 changes: 2 additions & 2 deletions docs/antlr.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. include:: links.rst


Using ANTLR Grammars
--------------------
ANTLR Grammars
--------------

.. _grammars: https://github.com/antlr/grammars-v4

Expand Down
25 changes: 0 additions & 25 deletions docs/asjson.rst

This file was deleted.

19 changes: 0 additions & 19 deletions docs/grako.rst

This file was deleted.

3 changes: 0 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,10 @@ input, much like the `re`_ module does with regular expressions, or it can gener
ast
semantics
models
asjson
print_translation
translation
left_recursion
mini-tutorial
traces
grako
antlr
examples
support
Expand Down
3 changes: 2 additions & 1 deletion docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ Installation
$ pip install tatsu
.. warning::
Versions of |TatSu| since 5.0.0 may require Python>=3.8. Python 2.7 is no longer supported
Modern versions of |TatSu| require active versions of Python (if the Python
version is more than one and a half years old, things may not work).

49 changes: 39 additions & 10 deletions docs/models.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
.. include:: links.rst


Models
------


Building Models
---------------
~~~~~~~~~~~~~~~

Naming elements in grammar rules makes the parser discard uninteresting
parts of the input, like punctuation, to produce an *Abstract Syntax
Expand Down Expand Up @@ -41,6 +45,32 @@ You can also use `Python`_'s built-in types as node types, and
default behavior can be overidden by defining a method to handle the
result of any particular grammar rule.



Viewing Models as JSON
~~~~~~~~~~~~~~~~~~~~~~


Models generated by |TatSu| can be viewed by converting them to a JSON-compatible structure
with the help of ``tatsu.util.asjson()``. The protocol tries to provide the best
representation for common types, and can handle any type using ``repr()``. There are provisions for structures with back-references, so there's no infinite recursion.

.. code:: python
import json
print(json.dumps(asjson(model), indent=2))
The ``model``, with richer semantics, remains unaltered.

Conversion to a JSON-compatible structure relies on the protocol defined by
``tatsu.utils.AsJSONMixin``. The mixin defines a ``__json__(seen=None)``
method that allows classes to define their best translation. You can use ``AsJSONMixin``
as a base class in your own models to take advantage of ``asjson()``, and you can
specialize the conversion by overriding ``AsJSONMixin.__json__()``.

You can also write your own version of ``asjson()`` to handle special cases that are recurrent in your context.

Walking Models
~~~~~~~~~~~~~~

Expand Down Expand Up @@ -82,19 +112,18 @@ methods such as:
return s
def walk_object(self, o):
raise Exception('Unexpected tyle %s walked', type(o).__name__)
raise Exception(f'Unexpected type {type(o).__name__} walked')
Predeclared classes can be passed to ``ModelBuilderSemantics`` instances
through the ``types=`` parameter:

.. code:: python
Which nodes get *walked* is up to the ``NodeWalker`` implementation. Some
strategies for walking *all* or *most* nodes are implemented as classes
in ``tatsu.wakers``, such as ``PreOrderWalker`` and ``DepthFirstWalker``.

from mymodel import AddOperator, MulOperator
Sometimes nodes must be walked more than once for the purpose at hand, and it's
up to the walker how and when to do that.

semantics=ModelBuilderSemantics(types=[AddOperator, MulOperator])
Take a look at ``tatsu.ngcodegen.PythonCodeGenerator`` for the walker that generates
a parser in Python from the model of a parsed grammar.

``ModelBuilderSemantics`` assumes nothing about ``types=``, so any
constructor (a function, or a partial function) can be used.

Model Class Hierarchies
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
41 changes: 0 additions & 41 deletions docs/print_translation.rst

This file was deleted.

64 changes: 59 additions & 5 deletions docs/translation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,68 @@
.. _pegen: https://github.com/we-like-parsers/pegen
.. _PEG parser: https://peps.python.org/pep-0617/

Declarative Translation
-----------------------
Translation
-----------

Translation is one of the most common tasks in language processing.
Analysis often sumarizes the parsed input, and *walkers* are good for that.
In translation, the output can often be as verbose as the input, so a systematic approach that avoids bookkeeping as much as possible is convenient.


|TatSu| doesn't impose a way to create translators, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.


Print Translation
~~~~~~~~~~~~~~~~~

Translation in |TatSu| is based on subclasses of ``NodeWalker``. Print-based translation
relies on classes that inherit from ``IndentPrintMixin``, a strategy copied from
the new PEG_ parser in Python_ (see `PEP 617`_).

``IndentPrintMixin`` provides an ``indent()`` method, which is a context manager,
and should be used thus:

.. code:: python
class MyTranslationWalker(NodeWalker, IndentPrintMixin):
def walk_SomeNodeType(self, node: NodeType):
self.print('some preamble')
with self.indent():
# continue walking the tree
self.print('something else')
The ``self.print()`` method takes note of the current level of indentation, so
output will be indented by the `indent` passed to
the ``IndentPrintMixin`` constructor, or to the ``indent(amount: int)`` method.
The mixin keeps as stack of the indent ammounts so it can go back to where it
was after each ``with indent(amount=n):`` statement:


.. code:: python
def walk_SomeNodeType(self, node: NodeType):
with self.indent(amount=2):
self.print(node.exp)
The printed code can be retrieved using the ``printed_text()`` method, but other
posibilities are available by assigning a stream-like object to
``self.output_stream`` in the ``__init__()`` method.

A good example of how to do code generation with a ``NodeWalker`` and ``IndentPrintMixin``
is |TatSu|'s own code generator, which can be
found in ``tatsu/ngcodegen/python.py``, or the model
generation found in ``tatsu/ngcodegen/objectomdel.py``.


.. _PEP 617: https://peps.python.org/pep-0617/


Declarative Translation (deprecated)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


|TatSu| provides support for template-based code generation ("translation", see below)
in the ``tatsu.codegen`` module.
Expand All @@ -26,8 +82,6 @@ breadth or depth first, using only standard Python_. The procedural code must kn
to navigate it, although other strategies are available with ``PreOrderWalker``, ``DepthFirstWalker``,
and ``ContextWalker``.

**deprecated**

|TatSu| doesn't impose a way to create translators with it, but it
exposes the facilities it uses to generate the `Python`_ source code for
parsers.
Expand Down
2 changes: 1 addition & 1 deletion grammar/tatsu.ebnf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
@@grammar :: Tatsu
@@grammar :: TatSu
@@whitespace :: /\s+/
@@comments :: ?"(?sm)[(][*](?:.|\n)*?[*][)]"
@@eol_comments :: ?"#[^\n]*$"
Expand Down
1 change: 1 addition & 0 deletions ruff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ ignore = [
"PLR0904", # too-many-public-methods
"PLR0913", # too-many-arguments
"PLR0915", # too-many-statements
"PLR0917", # too many possitional arguments
"PLR2004", # magic-value-comparison
"PLW1514", # unspecified-encoding
# "PLW0603", # global-statement
Expand Down
2 changes: 1 addition & 1 deletion tatsu/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '5.10.7b1'
__version__ = '5.11.0b1'
Loading

0 comments on commit 4181d08

Please sign in to comment.