Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 536: Final Grammar for Literal String Interpolation #155

Merged
merged 1 commit into from
Dec 14, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions pep-0536.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
PEP: 536
Title: Final Grammar for Literal String Interpolation
Version: $Revision$
Last-Modified: $Date$
Author: Philipp Angerer <[email protected]>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Dec-2016
Python-Version: 3.7
Post-History: 12-Dec-2016

Abstract
========

PEP 498 introduced Literal String Interpolation (or “f-strings”).
The expression portions of those literals however are subject to
certain restrictions. This PEP proposes a formal grammar lifting
those restrictions, promoting “f-strings” to “f expressions” or f-literals.

This PEP expands upon the f-strings introduced by PEP 498,
so this text requires familiarity with PEP 498.

Terminology
===========

This text will refer to the existing grammar as “f-strings”,
and the proposed one as “f-literals”.

Furthermore, it will refer to the ``{}``-delimited expressions in
f-literals/f-strings as “expression portions” and the static string content
around them as “string portions”.

Motivation
==========

The current implementation of f-strings in CPython relies on the existing
string parsing machinery and a post processing of its tokens. This results in
several restrictions to the possible expressions usable within f-strings:

#. It is impossible to use the quote character delimiting the f-string
within the expression portion::

>>> f'Magic wand: { bag['wand'] }'
^
SyntaxError: invalid syntax

#. A previously considered way around it would lead to escape sequences
in executed code and is prohibited in f-strings::

>>> f'Magic wand { bag[\'wand\'] } string'
SyntaxError: f-string expression portion cannot include a backslash

#. Comments are forbidden even in multi-line f-strings::

>>> f'''A complex trick: {
... bag['bag'] # recursive bags!
... }'''
SyntaxError: f-string expression part cannot include '#'

#. Expression portions need to wrap ``':'`` and ``'!'`` in braces::

>>> f'Useless use of lambdas: { lambda x: x*2 }'
SyntaxError: unexpected EOF while parsing

These limitations serve no purpose from a language user perspective and
can be lifted by giving f-literals a regular grammar without exceptions
and implementing it using dedicated parse code.

Rationale
=========

.. https://mail.python.org/pipermail/python-ideas/2016-August/041727.html

The restrictions mentioned in Motivation_ are non-obvious and counter-intuitive
unless the user is familiar with the f-literals’ implementation details.

As mentioned, a previous version of PEP 498 allowed escape sequences
anywhere in f-strings, including as ways to encode the braces delimiting
the expression portions and in their code. They would be expanded before
the code is parsed, which would have had several important ramifications:

#. It would not be clear to human readers which portions are Expressions
and which are strings. Great material for an “obfuscated/underhanded
Python challenge”
#. Syntax highlighters are good in parsing nested grammar, but not
in recognizing escape sequences. ECMAScript 2016 (JavaScript) allows
escape sequences in its identifiers [1]_ and the author knows of no
syntax highlighter able to correctly highlight code making use of this.

As a consequence, the expression portions would be harder to recognize
with and without the aid of syntax highlighting. With the new grammar,
it is easy to extend syntax highlighters to correctly parse
and display f-literals:

.. raw:: html

<pre><span style=color:#ff5500>f'Magic wand: </span><span style=color:#3daee9>{</span>bag[<span style=color:#bf0303>'wand'</span>]<span style=color:#3daee9>:^10}</span><span style=color:#ff5500>'</span></pre>

.. This is the output of kate-syntax-highlighter when given that code
(with some quotes stripped)

Highlighting expression portions with possible escape sequences would
mean to create a modified copy of all rules of the complete expression
grammar, accounting for the possibility of escape sequences in key words,
delimiters, and all other language syntax. One such duplication would
yield one level of escaping depth and have to be repeated for a deeper
escaping in a recursive f-literal. This is the case since no highlighting
engine known to the author supports expanding escape sequences before
applying rules to a certain context. Nesting contexts however is a
standard feature of all highlighting engines.

Familiarity also plays a role: Arbitrary nesting of expressions
without expansion of escape sequences is available in every single
other language employing a string interpolation method that uses
expressions instead of just variable names. [2]_

Specification
=============

PEP 498 specified f-strings as the following, but places restrictions on it::

f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '

All restrictions mentioned in the PEP are lifted from f-literals,
as explained below:

#. Expression portions may now contain strings delimited with the same
kind of quote that is used to delimit the f-literal.
#. Backslashes may now appear within expressions just like anywhere else
in Python code. In case of strings nested within f-literals,
escape sequences are expanded when the innermost string is evaluated.
#. Comments, using the ``'#'`` character, are possible only in multi-line
f-literals, since comments are terminated by the end of the line
(which makes closing a single-line f-literal impossible).
#. Expression portions may contain ``':'`` or ``'!'`` wherever
syntactically valid. The first ``':'`` or ``'!'`` that is not part
of an expression has to be followed a valid coercion or format specifier.

A remaining restriction not explicitly mentioned by PEP 498 is line breaks
in expression portions. Since strings delimited by single ``'`` or ``"``
characters are expected to be single line, line breaks remain illegal
in expression portions of single line strings.

.. note:: Is lifting of the restrictions sufficient,
or should we specify a more complete grammar?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An actual diff of Python's Grammar/Grammar file would be best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you point me to another PEP where this is done so i have a reference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a specific one off the top of my head. You're going to have to do this for the implementation anyway and so you can just embed the diff to the grammar file.


Backwards Compatibility
=======================

f-literals are fully backwards compatible to f-strings,
and expands the syntax considered legal.

Reference Implementation
========================

TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an implementation being worked on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet. isn’t the usual procedure to wait for feedback first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual procedure is to ask on python-ideas if the idea seems viable, and then if it does then you do a draft PEP to python-ideas, and then if it holds up you do a rough implementation and send it along with the PEP to python-dev because that's when the question of maintenance comes into play.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. in this case people already confirmed that they think this is a good idea, but i never coded anything in C. so where do i turn to for guidance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either you ask for help upfront when you post your PEP to python-ideas and end up with a co-author or you hope that python-dev thinks your idea is so good that a core dev agrees to do it for you (but that's not very common).


References
==========

.. [1] ECMAScript ``IdentifierName`` specification
( http://ecma-international.org/ecma-262/6.0/#sec-names-and-keywords )

Yes, ``const cthulhu = { H̹̙̦̮͉̩̗̗ͧ̇̏̊̾Eͨ͆͒̆ͮ̃͏̷̮̣̫̤̣Cͯ̂͐͏̨̛͔̦̟͈̻O̜͎͍͙͚̬̝̣̽ͮ͐͗̀ͤ̍̀͢M̴̡̲̭͍͇̼̟̯̦̉̒͠Ḛ̛̙̞̪̗ͥͤͩ̾͑̔͐ͅṮ̴̷̷̗̼͍̿̿̓̽͐H̙̙̔̄͜\u0042: 42 }`` is valid ECMAScript 2016

.. [2] Wikipedia article on string interpolation
( https://en.wikipedia.org/wiki/String_interpolation )

Copyright
=========

This document has been placed in the public domain.


..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: