From c99c42e066ae3fd4f93b8ae87050af969670d698 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Tue, 3 Jan 2017 15:19:37 +1000 Subject: [PATCH 01/36] PEP 538: add Background section on locale handling --- pep-0538.txt | 97 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 86 insertions(+), 11 deletions(-) diff --git a/pep-0538.txt b/pep-0538.txt index a509ee928fd..768273a049b 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -38,8 +38,66 @@ may also choose to opt in to this behaviour for earlier Python 3.x releases by applying the necessary changes as a downstream patch to those versions. -Specification -============= +Background +========== + +While the CPython interpreter is starting up, it may need to convert from +the ``char *`` format to the ``wchar_t *`` format, or from one of those formats +to ``PyUnicodeObject *``, before its own text encoding handling machinery is +fully configured. It handles these cases by relying on the operating system to +do the conversion and then ensuring that the text encoding name reported by +``sys.getfilesystemencoding()`` matches the encoding used during this early +bootstrapping process. + +On Mac OS X, this is straightforward, as Apple guarantees that these operations +will always use UTF-8 to do the conversion. + +On Windows, the limitations of the ``mbcs`` format used by default in these +conversions proved sufficiently problematic that PEP 528 and PEP 529 were +implemented to bypass the operating system supplied interfaces for binary data +handling and force the use of UTF-8 instead. + +On non-Apple \*nix systems however, these operations are handled using the C +locale system, which has the following characteristics [4_]: + +* by default, all processes start in the ``C`` locale, which uses ``ASCII`` + for these conversions. This is almost never what anyone doing multilingual + text processing actually wants (including CPython) +* calling ``setlocale(LC_ALL, "")`` reconfigures the active locale based on + the locale categories configured in the current process environment +* if the locale requested by the current environment is unknown, or no specific + locale is configured, then the default ``C`` locale will remain active + +The specific locale category that covers the APIs that CPython depends on is +``LC_CTYPE``, which applies to "classification and conversion of characters, +and to multibyte and wide characters" [5_]. Accordingly, CPython includes the +following key calls to ``setlocale``: + +* in ``Py_Initialize``, CPython calls ``setlocale(LC_CTYPE, "")``, such that + the configured locale settings for that category *always* match those set in + the environment. It does this unconditionally, and it *doesn't* revert the + process state change in ``Py_Finalize`` +* in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to + configure the entire C locale subsystem according to the process environment. + It does this prior to making any calls into the shared CPython library + +These calls are usually sufficient to provide sensible behaviour, but they can +still fail in the following cases: + +* SSH environment forwarding means that SSH clients will often forward + client locale settings to servers that don't have that locale installed +* some process environments (such as Linux containers) may not have any + explicit locale configured at all + + +Proposal +======== + +To better handle the cases where CPython would otherwise end up attempting +to operate in the ``C`` locale, this PEP proposes changes to CPython's +behaviour both when it is run as a standalone command line application, as well +as when it is used as a shared library to embed a Python runtime as part of a +larger application. When ``Py_Initialize`` is called and CPython detects that the configured locale is the default ``C`` locale, the following warning will be issued:: @@ -49,16 +107,24 @@ is the default ``C`` locale, the following warning will be issued:: `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar environment when running Python directly. +This warning informs both system and application integrators that they're +running Python 3 in a configuration that we don't expect to work properly. + By contrast, when CPython *is* the main application, it will instead automatically coerce the legacy C locale to the multilingual C.UTF-8 locale:: Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set PYTHONALLOWCLOCALE to disable this locale coercion behaviour). -This coercion is implemented by actually setting the ``LANG`` and ``LC_ALL`` -environment variables to ``C.UTF-8``, such that future calls to ``setlocale()`` -will see them, as will other components looking for those settings (such as -GUI development frameworks). +This locale coercion will mean that the standard Python binary should once +again "just work" in the two main failure cases we're aware of (missing locale +settings and SSH forwarding of unknown locales), as long as the target +platform provides the ``C.UTF-8`` locale. + +This coercion will be implemented by actually setting the ``LANG`` and +``LC_ALL`` environment variables to ``C.UTF-8``, such that future calls to +``setlocale()`` will see them, as will other components looking for those +settings (such as GUI development frameworks). The locale coercion will be skipped if the ``PYTHONALLOWCLOCALE`` environment variable is set to a non-empty string. The interpreter will always check for @@ -96,7 +162,9 @@ and instead made it a deployment requirement that systems be configured to use UTF-8 as the text encoding for operating system interfaces. Similarly, Node.js assumes UTF-8 by default (a behaviour inherited from the V8 JavaScript engine) and requires custom build settings to indicate it should use the system -locale settings for locale-aware operations. +locale settings for locale-aware operations. Both the JVM and the .NET CLR +use UTF-16-LE as their primary encoding for passing text between applications +and the underlying platform. The challenge for CPython has been the fact that in addition to being used for network service development, it is also extensively used as an embedded @@ -127,8 +195,9 @@ We've been trying to get strict bytes/text separation to work reliably in the legacy C locale for over a decade at this point. Not only haven't we been able to get it to work, neither has anyone else - the only viable alternatives identified have been to pass the bytes along verbatim without eagerly decoding -them to text (Python 2, Ruby, etc), or else to ignore the nominal locale -encoding entirely and assume the use of UTF-8 (Rust, Go, Node.js, etc). +them to text (Python 2.x, Ruby, etc), or else to ignore the nominal C/C++ locale +encoding entirely and assume the use of either UTF-8 (Rust, Go, Node.js, etc) +or UTF-16-LE (JVM, .NET CLR). While this PEP ensures that developers that need to do so can still opt-in to running their Python code in the legacy C locale, it also makes clear that we @@ -212,8 +281,8 @@ Implementation ============== A draft implementation of the change (including test cases) has been -posted to issue 28180 [1_](which requests that ``sys.getfilesystemencoding()`` -default to ``utf-8``) +posted to issue 28180 [1_], which is an end user request that +``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``. Backporting to earlier Python 3 releases @@ -266,6 +335,12 @@ References .. [3] Fedora: force C.UTF-8 when Python 3 is run under the C locale (https://bugzilla.redhat.com/show_bug.cgi?id=1404918) +.. [4] GNU C: How Programs Set the Locale + ( https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html) + +.. [5] GNU C: Locale Categories + (https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html) + Copyright ========= From 6f0928ebfd965b8ffc3f271b75e6a99aab07eb7c Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Tue, 3 Jan 2017 17:47:26 +1000 Subject: [PATCH 02/36] PEP 538: Clarify rationale for warning wording --- pep-0538.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/pep-0538.txt b/pep-0538.txt index 768273a049b..af905376a5c 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -108,7 +108,10 @@ is the default ``C`` locale, the following warning will be issued:: when running Python directly. This warning informs both system and application integrators that they're -running Python 3 in a configuration that we don't expect to work properly. +running Python 3 in a configuration that we don't expect to work properly. For +the benefit of folks working on maintaining such misconfigured systems, it +also provides instructions on how to deliberately reproduce a comparable +misconfiguration of the standalone command line application. By contrast, when CPython *is* the main application, it will instead automatically coerce the legacy C locale to the multilingual C.UTF-8 locale:: From 043254687aea6fa380f8c83ddec63ab0109915c1 Mon Sep 17 00:00:00 2001 From: Alex Chan Date: Wed, 4 Jan 2017 22:37:23 +0000 Subject: [PATCH 03/36] Fix typo in pep-0512.txt (#172) --- pep-0512.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0512.txt b/pep-0512.txt index 8ccad011c33..7987bda88cb 100644 --- a/pep-0512.txt +++ b/pep-0512.txt @@ -199,7 +199,7 @@ username corresponds to someone who has signed the CLA. Making a GET request to e.g. http://bugs.python.org/user?@template=clacheck&github_names=brettcannon,notanuser returns a JSON dictionary with the keys of the usernames requested -and a ``true`` value if they have sigend the CLA, ``false`` if they +and a ``true`` value if they have signed the CLA, ``false`` if they have not, and ``null`` if no corresponding GitHub username was found. From 9780f3ab4326676e4c85004b673ce8d2fc4fb1fc Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Thu, 5 Jan 2017 13:46:03 +0100 Subject: [PATCH 04/36] Add PEP 540: Add a new UTF-8 mode --- pep-0540.txt | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 286 insertions(+) create mode 100644 pep-0540.txt diff --git a/pep-0540.txt b/pep-0540.txt new file mode 100644 index 00000000000..f60088ef23c --- /dev/null +++ b/pep-0540.txt @@ -0,0 +1,286 @@ +PEP: 540 +Title: Add a new UTF-8 mode +Version: $Revision$ +Last-Modified: $Date$ +Author: Victor Stinner +Status: Draft +Type: Standards Track +Content-Type: text/x-rst +Created: 5-January-2016 +Python-Version: 3.7 + + +Abstract +======== + +Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system +data instead of the locale encoding. Add ``-X utf8`` command line option +and ``PYTHONUTF8`` environment variable. + + +Context +======= + +Locale and operating system data +-------------------------------- + +Python uses the ``LC_CTYPE`` locale to decide how to encode and decode +data from/to the operating system: + +* file content +* command line arguments: ``sys.argv`` +* standard streams: ``sys.stdin``, ``sys.stdout``, ``sys.stderr`` +* environment variables: ``os.environ`` +* filenames: ``os.listdir(str)`` for example +* pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example +* error messages +* name of a timezone +* user name, terminal name: ``os``, ``grp`` and ``pwd`` modules +* host name, UNIX socket path: see the ``socket`` module +* etc. + +At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user +``LC_CTYPE`` locale and then store the locale encoding, +``sys.getfilesystemencoding()``. In the whole lifetime of a Python process, +the same encoding and error handler are used to encode and decode data +from/to the operating system. + +.. note:: + In some corner case, the *current* ``LC_CTYPE`` locale must be used + instead of ``sys.getfilesystemencoding()``. For example, the ``time`` + module uses the *current* ``LC_CTYPE`` locale to decode timezone + names. + + +The POSIX locale and its encoding +--------------------------------- + +The following environment variables are used to configure the locale, in +this preference order: + +* ``LC_ALL``, most important variable +* ``LC_CTYPE`` +* ``LANG`` + +The POSIX locale,also known as "the C locale", is used: + +* if the first set variable is set to ``"C"`` +* if all these variables are unset, for example when a program is + started in an empty environment. + +The encoding of the POSIX locale must be ASCII or a superset of ASCII. + +On Linux, the POSIX locale uses the ASCII encoding. + +On FreeBSD and Solaris, ``nl_langinfo(CODESET)`` announces an alias of +the ASCII encoding, whereas ``mbstowcs()`` and ``wcstombs()`` functions +use the ISO 8859-1 encoding (Latin1) in practice. The problem is that +``os.fsencode()`` and ``os.fsdecode()`` use +``locale.getpreferredencoding()`` codec. For example, if command line +arguments are decoded by ``mbstowcs()`` and encoded back by +``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead +of retrieving the original byte string. + +To fix this issue, Python now checks since Python 3.4 if ``mbstowcs()`` +really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the +POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an +alias to ASCII). If not (the effective encoding is not ASCII), Python +uses its own ASCII codec instead of using ``mbstowcs()`` and +``wcstombs()`` functions for operating system data. + +See the `POSIX locale (2016 Edition) +`_. + + +C.UTF-8 and C.utf8 locales +-------------------------- + +Some operating systems provide a variant of the POSIX locale using the +UTF-8 encoding: + +* Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` +* Debian (eglibc 2.13-1, 2011): ``"C.UTF-8"`` +* HP-UX: ``"C.utf8"`` + +It was proposed to add a ``C.UTF-8`` locale to glibc: `glibc C.UTF-8 +proposal `_. + + +Popularity of the UTF-8 encoding +-------------------------------- + +Python 3 uses UTF-8 by default for Python source files. + +On Mac OS X, Windows and Android, Python always use UTF-8 for operating +system data instead of the locale encoding. For Windows, see the `PEP +529: Change Windows filesystem encoding to UTF-8 +`_. + +On Linux, UTF-8 became the defacto standard encoding by default, +replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, +using different encodings for filenames and standard streams is likely +to create mojibake, so UTF-8 is now used *everywhere*. + +The UTF-8 encoding is the default encoding of XML and JSON file format. +In January 2017, UTF-8 was used in `more than 88% of web pages +`_ (HTML, +Javascript, CSS, etc.). + +See `utf8everywhere.org `_ for more general +information on the UTF-8 codec. + +.. note:: + Some applications and operating systems (especially Windows) use Byte + Order Markers (BOM) to indicate the used Unicode encoding: UTF-7, + UTF-8, UTF-16-LE, etc. BOM are not well supported and rarely used in + Python. + + +Old data stored in different encodings and surrogateescape +---------------------------------------------------------- + +Even if UTF-8 became the defacto standard, there are still systems in +the wild which don't use UTF-8. And there are a lot of data stored in +different encodings. For example, an old USB key using the ext3 +filesystem with filenames encoded to ISO 8859-1. + +The Linux kernel and the libc don't decode filenames: a filename is used +as a raw array of bytes. The common solution to support any filename is +to store filenames as bytes and don't try to decode them. When displayed to +stdout, mojibake is displayed if the filename and the terminal don't use +the same encoding. + +Python 3 promotes Unicode everywhere including filenames. A solution to +support filenames not decodable from the locale encoding was found: the +``surrogateescape`` error handler (`PEP 393 +`_), store undecodable bytes +as surrogate characters. This error handler is used by default for +operating system data, by ``os.fsdecode()`` and ``os.fsencode()`` for +example (except on Windows which uses the ``strict`` error handler). + + +Standard streams +---------------- + +Python uses the locale encoding for standard streams: stdin, stdout and +stderr. The ``strict`` error handler is used by stdin and stdout to +prevent mojibake. + +The ``backslashreplace`` error handler is used by stderr to avoid +Unicode encode error when displaying non-ASCII text. It is especially +useful when the POSIX locale is used, because this locale usually uses +the ASCII encoding. + +The problem is that operating system data like filenames are decoded +using the ``surrogateescape`` error handler (PEP 393). Displaying a +filename to stdout raises an Unicode encode error if the filename +contains an undecoded byte stored as a surrogate character. + +Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the +POSIX locale is used: `issue #19977 `_. The +idea is to passthrough operating system data even if it means mojibake, because +most UNIX applications work like that. Most UNIX applications store filenames +as bytes, usually simply because bytes are first-citizen class in the used +programming language, whereas Unicode is badly supported. + +.. note:: + The encoding and/or the error handler of standard streams can be + overriden with the ``PYTHONIOENCODING`` environment variable. + + +Proposal +======== + +Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system data +instead of the locale encoding: + +* Add ``-X utf8`` command line option +* Add ``PYTHONUTF8=1`` environment variable + +Add also a strict UTF-8 mode, enabled by ``-X utf8=strict`` or +``PYTHONUTF8=strict``. + +The UTF-8 mode changes the default encoding and error handler used by +open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and +sys.stderr: + +============================ ======================= ======================= ====================== ====================== +Function Default, other locales Default, POSIX locale UTF-8 UTF-8 Strict +============================ ======================= ======================= ====================== ====================== +open() locale/strict locale/strict UTF-8/surrogateescape UTF-8/strict +os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape UTF-8/surrogateescape UTF-8/strict +sys.stdin locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict +sys.stdout locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict +sys.stderr locale/backslashreplace locale/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace +============================ ======================= ======================= ====================== ====================== + +The UTF-8 mode is disabled by default to keep hard Unicode errors when +encoding or decoding operating system data failed, and to keep the +backward compatibility. The user is responsible to enable explicitly the +UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 +mode would be enabled *by default*. + +The UTF-8 mode should be used on systems known to be configured with +UTF-8 where most applications speak UTF-8. It prevents Unicode errors if +the user overrides a locale *by mistake* or if a Python program is +started with no locale configured (and so with the POSIX locale). + +Most UNIX applications handle operating system data as bytes, so +``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a +limited impact on how these data are handled by the application. + +The Python UTF-8 mode should help to make Python more interoperable with +the other UNIX applications in the system assuming that *UTF-8* is used +everywhere and that users *expect* UTF-8. + +Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in +Python is more convenient, since they are more commonly misconfigured +*by mistake* (configured to use an encoding different than UTF-8, +whereas the system uses UTF-8), rather than being misconfigured by intent. + + +Backward Compatibility +====================== + +Since the UTF-8 mode is disabled by default, it has no impact on the +backward compatibility. The new UTF-8 mode must be enabled explicitly. + + +Alternatives +============ + +Always use UTF-8 +---------------- + +Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. +Since UTF-8 became the defacto encoding, it makes sense to always use it on all +platforms with any locale. + +The risk is to introduce mojibake if the locale uses a different encoding, +especially for locales other than the POSIX locale. + + +Force UTF-8 for the POSIX locale +-------------------------------- + +An alternative to always using UTF-8 in any case is to only use UTF-8 when the +``LC_CTYPE`` locale is the POSIX locale. + +The `PEP 538: Coercing the legacy C locale to C.UTF-8 +`_ of Nick Coghlan proposes to +implement that using the ``C.UTF-8`` locale. + + +Related Work +============ + +Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment +varaible to force UTF-8: see `perlrun +`_. It is possible to configure +UTF-8 per standard stream, on input and output streams, etc. + + +Copyright +========= + +This document has been placed in the public domain. From 5b6b25f5d9251a1b6f0329fc4fc48e5a65fa57a0 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Thu, 5 Jan 2017 23:54:22 +0100 Subject: [PATCH 05/36] Update PEP 540 * Enable UTF-8 mode by default if the locale is POSIX * Add Use Cases * Add "Don't modify the encoding of the POSIX locale" alternative * Rephase Abstract and Proposal * Proposal: mention expected mojibake issues * Fix PEP number: 393 => 383 * Add links --- pep-0540.txt | 305 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 266 insertions(+), 39 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index f60088ef23c..df3631ca3d7 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -13,9 +13,16 @@ Python-Version: 3.7 Abstract ======== -Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system -data instead of the locale encoding. Add ``-X utf8`` command line option -and ``PYTHONUTF8`` environment variable. +Add a new UTF-8 mode, disabled by default, to ignore the locale and +force the usage of the UTF-8 encoding. + +Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't +bother users with encodings, but it can produce mojibake. The UTF-8 mode +can be configured as strict to prevent mojibake. + +New ``-X utf8`` command line option and ``PYTHONUTF8`` environment +variable are added to control the UTF-8 mode. The POSIX locale enables +the UTF-8 mode. Context @@ -33,9 +40,8 @@ data from/to the operating system: * environment variables: ``os.environ`` * filenames: ``os.listdir(str)`` for example * pipes: ``subprocess.Popen`` using ``subprocess.PIPE`` for example -* error messages -* name of a timezone -* user name, terminal name: ``os``, ``grp`` and ``pwd`` modules +* error messages: ``os.strerror(code)`` for example +* user and terminal names: ``os``, ``grp`` and ``pwd`` modules * host name, UNIX socket path: see the ``socket`` module * etc. @@ -81,7 +87,7 @@ arguments are decoded by ``mbstowcs()`` and encoded back by ``os.fsencode()``, an ``UnicodeEncodeError`` exception is raised instead of retrieving the original byte string. -To fix this issue, Python now checks since Python 3.4 if ``mbstowcs()`` +To fix this issue, Python checks since Python 3.4 if ``mbstowcs()`` really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an alias to ASCII). If not (the effective encoding is not ASCII), Python @@ -95,16 +101,18 @@ See the `POSIX locale (2016 Edition) C.UTF-8 and C.utf8 locales -------------------------- -Some operating systems provide a variant of the POSIX locale using the +Some UNIX operating systems provide a variant of the POSIX locale using the UTF-8 encoding: * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` -* Debian (eglibc 2.13-1, 2011): ``"C.UTF-8"`` +* Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"`` * HP-UX: ``"C.utf8"`` -It was proposed to add a ``C.UTF-8`` locale to glibc: `glibc C.UTF-8 +It was proposed to add a ``C.UTF-8`` locale to the glibc: `glibc C.UTF-8 proposal `_. +It is not planned to add such locale to BSD systems. + Popularity of the UTF-8 encoding -------------------------------- @@ -112,11 +120,10 @@ Popularity of the UTF-8 encoding Python 3 uses UTF-8 by default for Python source files. On Mac OS X, Windows and Android, Python always use UTF-8 for operating -system data instead of the locale encoding. For Windows, see the `PEP -529: Change Windows filesystem encoding to UTF-8 -`_. +system data. For Windows, see the PEP 529: "Change Windows filesystem +encoding to UTF-8". -On Linux, UTF-8 became the defacto standard encoding by default, +On Linux, UTF-8 became the defacto standard encoding, replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, using different encodings for filenames and standard streams is likely to create mojibake, so UTF-8 is now used *everywhere*. @@ -152,8 +159,7 @@ the same encoding. Python 3 promotes Unicode everywhere including filenames. A solution to support filenames not decodable from the locale encoding was found: the -``surrogateescape`` error handler (`PEP 393 -`_), store undecodable bytes +``surrogateescape`` error handler (PEP 383), store undecodable bytes as surrogate characters. This error handler is used by default for operating system data, by ``os.fsdecode()`` and ``os.fsencode()`` for example (except on Windows which uses the ``strict`` error handler). @@ -172,7 +178,7 @@ useful when the POSIX locale is used, because this locale usually uses the ASCII encoding. The problem is that operating system data like filenames are decoded -using the ``surrogateescape`` error handler (PEP 393). Displaying a +using the ``surrogateescape`` error handler (PEP 383). Displaying a filename to stdout raises an Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. @@ -191,28 +197,60 @@ programming language, whereas Unicode is badly supported. Proposal ======== -Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system data -instead of the locale encoding: +Changes +------- + +Add a new UTF-8 mode, disabled by default, to ignore the locale and +force the usage of the UTF-8 encoding with the ``surrogateescape`` error +handler, instead using the locale encoding (with ``strict`` or +``surrogateescape`` error handler depending on the case). -* Add ``-X utf8`` command line option -* Add ``PYTHONUTF8=1`` environment variable +Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't +bother users with encodings, but it can produce mojibake. It can be +configured as strict to prevent mojibake: the UTF-8 encoding is used +with the ``strict`` error handler in this case. -Add also a strict UTF-8 mode, enabled by ``-X utf8=strict`` or -``PYTHONUTF8=strict``. +New ``-X utf8`` command line option and ``PYTHONUTF8`` environment +variable are added to control the UTF-8 mode. The UTF-8 mode is enabled +by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict +by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. + +The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode +can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. + +Encoding and error handler +-------------------------- The UTF-8 mode changes the default encoding and error handler used by open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and sys.stderr: -============================ ======================= ======================= ====================== ====================== -Function Default, other locales Default, POSIX locale UTF-8 UTF-8 Strict -============================ ======================= ======================= ====================== ====================== -open() locale/strict locale/strict UTF-8/surrogateescape UTF-8/strict -os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape UTF-8/surrogateescape UTF-8/strict -sys.stdin locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict -sys.stdout locale/strict locale/surrogateescape UTF-8/surrogateescape UTF-8/strict -sys.stderr locale/backslashreplace locale/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace -============================ ======================= ======================= ====================== ====================== +============================ ======================= ========================== ========================== +Function Default UTF-8 or POSIX locale UTF-8 Strict +============================ ======================= ========================== ========================== +open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict +os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8/strict** +sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict +sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace +============================ ======================= ========================== ========================== + +By comparison, Python 3.6 uses: + +============================ ======================= ========================== +Function Default POSIX locale +============================ ======================= ========================== +open() locale/strict locale/strict +os.fsdecode(), os.fsencode() locale/surrogateescape locale/surrogateescape +sys.stdin, sys.stdout locale/strict locale/**surrogateescape** +sys.stderr locale/backslashreplace locale/backslashreplace +============================ ======================= ========================== + +The UTF-8 mode uses the ``surrogateescape`` error handler instead of the +strict mode for convenience: the idea is that data not encoded to UTF-8 +are passed through "Python" without being modified, as raw bytes. + +Rationale +--------- The UTF-8 mode is disabled by default to keep hard Unicode errors when encoding or decoding operating system data failed, and to keep the @@ -238,17 +276,184 @@ Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, whereas the system uses UTF-8), rather than being misconfigured by intent. +Expected mojibake issues +------------------------ + +The UTF-8 mode only affects Python 3.7 code, other code is not aware of this +mode. + +If Python 3.7 is used as a producer in a ``producer | consumer`` shell command +and the consumer may fail to decode input data if it decodes it and the locale +encoding is not UTF-8. If the consumer doesn't decode inputs, process them +as bytes, it should just work. + +If Python 3.7 is used as a consumer in a ``producer | consumer`` shell command, +it should just work. + +If Python calls third party libraries or if Python is embedded in an +application, code outside Python is not aware of the UTF-8 mode. If the other +code uses UTF-8, it's fine. If the other code uses the locale encoding, +mojibake will occur when the locale encoding is not UTF-8. + + +Use Cases +========= + +List a directory into stdout +---------------------------- + +Script listing the content of the current directory into stdout:: + + import os + for name in os.listdir(os.curdir): + print(name) + +Result: + +======================== ============================== +Python Always work? +======================== ============================== +Python 2 **Yes** +Python 3 No +Python 3.5, POSIX locale **Yes** +UTF-8 mode **Yes** +UTF-8 Strict mode No +======================== ============================== + +"Yes" means that the script cannot fail, but it can produce mojibake. + +"No" means that the script can fail on decoding or encoding a filename +depending on the locale or the filename. + + +List a directory into a text file +--------------------------------- + +Similar to the previous example, except that the listing is written into +a text file:: + + import os + names = os.listdir(os.curdir) + with open("/tmp/content.txt", "w") as fp: + for name in names: + fp.write("%s\n" % name) + +Result: + +======================== ============================== +Python Always work? +======================== ============================== +Python 2 **Yes** +Python 3 No +Python 3.5, POSIX locale No +UTF-8 mode **Yes** +UTF-8 Strict mode No +======================== ============================== + +"Yes" means that the script cannot fail, but it can produce mojibake. + +"No" means that the script can fail on decoding or encoding a filename +depending on the locale or the filename. Typical error:: + + $ LC_ALL=C python3 test.py + Traceback (most recent call last): + File "test.py", line 5, in + fp.write("%s\n" % name) + UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) + + +Display Unicode characters into stdout +-------------------------------------- + +Very basic example used to illustrate a common issue, display the euro sign +(U+20AC: €):: + + print("euro: \u20ac") + +Result: + +======================== ============================== +Python Always work? +======================== ============================== +Python 2 No +Python 3 No +Python 3.5, POSIX locale No +UTF-8 mode **Yes** +UTF-8 Strict mode **Yes** +======================== ============================== + +"Yes" means that the script cannot fail, but it can produce mojibake. + +"No" means that the script can fail on encoding the euro sign depending on the +locale encoding. + + +Replace a word in a text +------------------------ + +The following scripts replaces the word "apple" with "orange". It +reads input from stdin and writes the output into stdout:: + + import sys + text = sys.stdin.read() + sys.stdout.write(text.replace("apple", "orange")) + +Result: + +======================== ============================== +Python Always work? +======================== ============================== +Python 2 **Yes** +Python 3 No +Python 3.5, POSIX locale **Yes** +UTF-8 mode **Yes** +UTF-8 Strict mode No +======================== ============================== + +"Yes" means that the script cannot fail. + +"No" means that the script can fail on decoding the input depending on +the locale. + Backward Compatibility ====================== -Since the UTF-8 mode is disabled by default, it has no impact on the -backward compatibility. The new UTF-8 mode must be enabled explicitly. +The main backward incompatible change is that the UTF-8 encoding is now +used if the locale is POSIX. Since the UTF-8 encoding is used with the +``surrogateescape`` error handler, ecoding errors should not occur and +so the change should not break applications. + +The more likely source of trouble comes from external libraries. Python +can decode successfully data from UTF-8, but a library using the locale +encoding can fail to encode the decoded text back to bytes. Hopefully, +encoding text in a library is a rare operation. Very few libraries +expect text, most libraries expect bytes and even manipulate bytes +internally. + +If the locale is not POSIX, the PEP has no impact on the backward +compatibility since the UTF-8 mode is disabled by default in this case, +it must be enabled explicitly. Alternatives ============ +Don't modify the encoding of the POSIX locale +--------------------------------------------- + +A first version of the PEP did not change the encoding and error handler +used of the POSIX locale. + +The problem is that adding a command line option or setting an environment +variable is not possible in some cases, or at least not convenient. + +Moreover, many users simply expect that Python 3 behaves as Python 2: +don't bother them with encodings and "just works" in all cases. These +users don't worry about mojibake, or even expect mojibake because of +complex documents using multiple incompatibles encodings. + + Always use UTF-8 ---------------- @@ -266,13 +471,35 @@ Force UTF-8 for the POSIX locale An alternative to always using UTF-8 in any case is to only use UTF-8 when the ``LC_CTYPE`` locale is the POSIX locale. -The `PEP 538: Coercing the legacy C locale to C.UTF-8 -`_ of Nick Coghlan proposes to -implement that using the ``C.UTF-8`` locale. +The PEP 538 "Coercing the legacy C locale to C.UTF-8" of Nick Coghlan +proposes to implement that using the ``C.UTF-8`` locale. -Related Work -============ +Links +===== + +PEPs: + +* PEP 538 "Coercing the legacy C locale to C.UTF-8" +* PEP 529: "Change Windows filesystem encoding to UTF-8" +* PEP 383: "Non-decodable Bytes in System Character Interfaces" + +Python issues: + +* `issue #28180: sys.getfilesystemencoding() should default to utf-8 + `_ +* `Issue #19846: Python 3 raises Unicode errors with the C locale + `_ +* `Issue #8622: Add PYTHONFSENCODING environment variable + `_: added but reverted because of + many issues, read the `Inconsistencies if locale and filesystem + encodings are different + `_ + thread on the python-dev mailing list + + +Prior Art +========= Perl has a ``-C`` command line option and a ``PERLUNICODE`` environment varaible to force UTF-8: see `perlrun From 3c6b56f10c25c808bef5b798fcdff1a9af4d0059 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Fri, 6 Jan 2017 13:57:10 +0100 Subject: [PATCH 06/36] PEP 540 * Add "POSIX locale used by mistake" section * Add a lot of issues in the Links section --- pep-0540.txt | 70 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 67 insertions(+), 3 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index df3631ca3d7..07e2b19e200 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -98,6 +98,20 @@ See the `POSIX locale (2016 Edition) `_. +POSIX locale used by mistake +---------------------------- + +In many cases, the POSIX locale is not really expected by users who get +it by mistake. Examples: + +* program started in an empty environment +* User forcing LANG=C to get messages in english +* LANG=C used for bad reasons, without being aware of the ASCII encoding +* SSH shell +* User locale set to a non-existing locale, typo in the locale name for + example + + C.UTF-8 and C.utf8 locales -------------------------- @@ -484,12 +498,15 @@ PEPs: * PEP 529: "Change Windows filesystem encoding to UTF-8" * PEP 383: "Non-decodable Bytes in System Character Interfaces" -Python issues: +Main Python issues: * `issue #28180: sys.getfilesystemencoding() should default to utf-8 `_ -* `Issue #19846: Python 3 raises Unicode errors with the C locale - `_ +* `Issue #19977: Use "surrogateescape" error handler for sys.stdin and + sys.stdout on UNIX for the C locale + `_ +* `Issue #19847: Setting the default filesystem-encoding + `_ * `Issue #8622: Add PYTHONFSENCODING environment variable `_: added but reverted because of many issues, read the `Inconsistencies if locale and filesystem @@ -497,6 +514,53 @@ Python issues: `_ thread on the python-dev mailing list +Incomplete list of Python issues related to Unicode errors, especially +with the POSIX locale: + +* 2016-12-22: `LANG=C python3 -c "import os; os.path.exists('\xff')" + `_ +* 2014-07-20: `issue #22016: Add a new 'surrogatereplace' output only error handler + `_ +* 2014-04-27: `Issue #21368: Check for systemd locale on startup if current + locale is set to POSIX `_ -- read manually + /etc/locale.conf when the locale is POSIX +* 2014-01-21: `Issue #20329: zipfile.extractall fails in Posix shell with utf-8 + filename + `_ +* 2013-11-30: `Issue #19846: Python 3 raises Unicode errors with the C locale + `_ +* 2010-05-04: `Issue #8610: Python3/POSIX: errors if file system encoding is None + `_ +* 2013-08-12: `Issue #18713: Clearly document the use of PYTHONIOENCODING to + set surrogateescape `_ +* 2013-09-27: `Issue #19100: Use backslashreplace in pprint + `_ +* 2012-01-05: `Issue #13717: os.walk() + print fails with UnicodeEncodeError + `_ +* 2011-12-20: `Issue #13643: 'ascii' is a bad filesystem default encoding + `_ +* 2011-03-16: `issue #11574: TextIOWrapper should use UTF-8 by default for the + POSIX locale + `_, thread on python-dev: + `Low-Level Encoding Behavior on Python 3 + `_ +* 2010-04-26: `Issue #8533: regrtest: use backslashreplace error handler for + stdout `_, regrtest fails with Unicode + encode error if the locale is POSIX + +Some issues are real bug in applications which must set explicitly the +encoding. Well, it just works in the common case (locale configured +correctly), so what? But the program "suddenly" fails when the POSIX +locale is used (probably for bad reasons). Such bug is not well +understood by users. Example of such issue: + +* 2013-11-21: `pip: open() uses the locale encoding to parse Python + script, instead of the encoding cookie + `_ -- pip must use the encoding + cookie to read a Python source code file +* 2011-01-21: `IDLE 3.x can crash decoding recent file list + `_ + Prior Art ========= From 8b9a0147d54bd546dd52a1d27619e84aea54fba9 Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Fri, 6 Jan 2017 10:45:05 -0800 Subject: [PATCH 07/36] Update migration status (#173) Also clearly delineate what issues are blocker, post-migration things to do, and non-blockers. --- pep-0512.txt | 32 ++++++++++++++++++++------------ 1 file changed, 20 insertions(+), 12 deletions(-) diff --git a/pep-0512.txt b/pep-0512.txt index 7987bda88cb..7734d16b5ec 100644 --- a/pep-0512.txt +++ b/pep-0512.txt @@ -702,41 +702,49 @@ Required: * Not started - - `Update PEP 101`_ (commitment from Ned Deily to do this) + - `Update PEP 101`_ (commitment from Ned Deily to do this; + **non-blocker**) - Email python-checkins for each commit (PR or direct) - (https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/) + (https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/; + **post-migration**) - Message #python-dev for each commit (PR or direct) (https://github.com/python/cpython/settings/hooks/new?service=irc; - commitment from R. David Murray) + commitment from R. David Murray; + **post-migration**) * In progress - `Linking a pull request to an issue`_ (http://psf.upfronthosting.co.za/roundup/meta/issue589; - field added to bugs.python.org, now need bot to notice issues in PRs) + testing; + **blocker**) - `Notify the issue if a commit is made`_ (http://psf.upfronthosting.co.za/roundup/meta/issue590; - review committal from Ezio Melotti) + review committal from Ezio Melotti; + **blocker**) - `Deprecate sys._mercurial`_ (http://bugs.python.org/issue27593; - review committal from Ned Deily) + review committal from Ned Deily; + **non-blocker**) - Migrate buildbots to be triggered and pull from GitHub - (Zach Ware is looking into requirements) + (Zach Ware thinks this won't be an issue; + **post-migration**) - `Update the linking service for mapping commit IDs to URLs`_ (code ready, needs deployment once the hg repository is made read-only; - https://gist.github.com/brettcannon/f8d97c92b0df264cd4db008ffd32daf9) - - Update commit hash detection on b.p.o to support 10- and 11-character hashes - (http://psf.upfronthosting.co.za/roundup/meta/issue610) + https://gist.github.com/brettcannon/f8d97c92b0df264cd4db008ffd32daf9; + **post-migration**) - Get docs built from git (https://github.com/python/docsbuild-scripts/blob/master/build_docs.py already - updated to work with git; finding out where the invocations are to make sure - they pass the appropriate ``--git`` flag) + updated; https://github.com/python/psf-salt/pull/91 to switch; + **post-migration**) * Completed - `Update the devguide`_ (including `Document steps to commit a pull request`_) (https://github.com/python/devguide/milestone/1) + - Update commit hash detection on b.p.o to support 10- and 11-character hashes + (http://psf.upfronthosting.co.za/roundup/meta/issue610) Optional features: From 9807b217f8877e209b516f82e00a602d7aa5eff5 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Sat, 7 Jan 2017 02:35:27 +0100 Subject: [PATCH 08/36] PEP 540 * add section: "It's not a bug, you must fix your locale" is not an acceptable answer * elaborate the "Expected mojibake and surrogate character issues" section * Add the "Producer-consumer model using pipes" use case --- pep-0540.txt | 319 ++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 254 insertions(+), 65 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index 07e2b19e200..904237de25a 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -25,8 +25,53 @@ variable are added to control the UTF-8 mode. The POSIX locale enables the UTF-8 mode. -Context -======= +Rationale +========= + +"It's not a bug, you must fix your locale" is not an acceptable answer +---------------------------------------------------------------------- + +Since Python 3.0 was released in 2008, the usual answer to users getting +Unicode errors is to ask developers to fix their code to handle Unicode +properly. Most applications and Python modules were fixed, but users +keep reporting Unicode errors regulary: see the long list of issues in +the `Links`_ section below. + +In fact, a second class of bug comes from a locale which is not properly +configured. The usual answer to such bug report is: "it is not a bug, +you must fix your locale". + +Technically, the answer is correct, but from a practical point of view, +the answer is not acceptable. In many cases, "fixing the issue" is an +hard task. Moreover, sometimes, the usage of the POSIX locale is +deliberate. + +A good example of a concrete issue are build systems which create a +fresh environment for each build using a chroot, a container, a virtual +machine or something else to get reproductible builds. Such setup +usually uses the POSIX locale. To get 100% reproductible builds, the +POSIX locale is a good choice: see the `Locales section of +reproducible-builds.org +`_. + +UNIX users don't expect Unicode errors, since the common command lines +tools like ``cat``, ``grep`` or ``sed`` never fail with Unicode errors. +These users expect that Python 3 "just works" with any locale and don't +bother them with encodings. From their point of the view, the bug is not +their locale but is obviously Python 3. + +Since Python 2 handles data as bytes, it's more rare in Python 2 +compared to Python 3 to get Unicode errors. It also explains why users +also perceive Python 3 as the root cause of their Unicode errors. + +Some users expect that Python 3 just works with any locale and so don't +bother of mojibake, whereas some developers are working hard to prevent +mojibake and so expect that Python 3 fails early before creating +mojibake. + +Since different group of users have different expectations, there is no +silver bullet which solves all issues at once. Last but not least, +backward compatibility should be preserved whenever possible. Locale and operating system data -------------------------------- @@ -108,6 +153,9 @@ it by mistake. Examples: * User forcing LANG=C to get messages in english * LANG=C used for bad reasons, without being aware of the ASCII encoding * SSH shell +* Linux installed with no configured locale +* chroot environment, Docker image, container, ... with no locale is + configured * User locale set to a non-existing locale, typo in the locale name for example @@ -290,29 +338,57 @@ Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, whereas the system uses UTF-8), rather than being misconfigured by intent. -Expected mojibake issues ------------------------- +Expected mojibake and surrogate character issues +------------------------------------------------ + +The UTF-8 mode only affects code running directly in Python, especially +code written in pure Python. The other code, called "external code" +here, is not aware of this mode. Examples: + +* C libraries called by Python modules like OpenSSL +* The application code when Python is embedded in an application + +In the UTF-8 mode, Python uses the ``surrogateescape`` error handler +which stores bytes not decodable from UTF-8 as surrogate characters. -The UTF-8 mode only affects Python 3.7 code, other code is not aware of this -mode. +If the external code uses the locale and the locale encoding is UTF-8, +it should work fine. -If Python 3.7 is used as a producer in a ``producer | consumer`` shell command -and the consumer may fail to decode input data if it decodes it and the locale -encoding is not UTF-8. If the consumer doesn't decode inputs, process them -as bytes, it should just work. +External code using bytes +^^^^^^^^^^^^^^^^^^^^^^^^^ -If Python 3.7 is used as a consumer in a ``producer | consumer`` shell command, -it should just work. +If the external code process data as bytes, surrogate characters are not +an issue since they are only used inside Python. Python encodes back +surrogate characters to bytes at the edges, before calling external +code. -If Python calls third party libraries or if Python is embedded in an -application, code outside Python is not aware of the UTF-8 mode. If the other -code uses UTF-8, it's fine. If the other code uses the locale encoding, -mojibake will occur when the locale encoding is not UTF-8. +The UTF-8 mode can produce mojibake since Python and external code don't +both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can +be configured as strict to prevent mojibake and be fail early when data +is not decodable from UTF-8. + +External code using text +^^^^^^^^^^^^^^^^^^^^^^^^ + +If the external code uses text API, for example using the ``wchar_t*`` C +type, mojibake should not occur, but the external code can fail on +surrogate characters. Use Cases ========= +The following use cases were written to help to understand the impact of +chosen encodings and error handlers on concrete examples. + +The "Always work" results were written to prove the benefit of having a +UTF-8 mode which works with any data and any locale, compared to the +existing old Python versions. + +The "Mojibake" column shows that ignoring the locale causes a pratical +issue: the UTF-8 mode produces mojibake if the terminal doesn't use the +UTF-8 encoding. + List a directory into stdout ---------------------------- @@ -324,21 +400,23 @@ Script listing the content of the current directory into stdout:: Result: -======================== ============================== -Python Always work? -======================== ============================== -Python 2 **Yes** -Python 3 No -Python 3.5, POSIX locale **Yes** -UTF-8 mode **Yes** -UTF-8 Strict mode No -======================== ============================== - -"Yes" means that the script cannot fail, but it can produce mojibake. +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 **Yes** **Yes** +Python 3 No No +Python 3.5, POSIX locale **Yes** **Yes** +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= "No" means that the script can fail on decoding or encoding a filename depending on the locale or the filename. +To be able to always work, the program must be able to produce mojibake. +Mojibake is more user friendly than an error with a truncated or empty +output. + List a directory into a text file --------------------------------- @@ -354,20 +432,19 @@ a text file:: Result: -======================== ============================== -Python Always work? -======================== ============================== -Python 2 **Yes** -Python 3 No -Python 3.5, POSIX locale No -UTF-8 mode **Yes** -UTF-8 Strict mode No -======================== ============================== +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 **Yes** **Yes** +Python 3 No No +Python 3.5, POSIX locale No No +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= -"Yes" means that the script cannot fail, but it can produce mojibake. - -"No" means that the script can fail on decoding or encoding a filename -depending on the locale or the filename. Typical error:: +"Yes" involves that mojibake can be produced. "No" means that the script +can fail on decoding or encoding a filename depending on the locale or +the filename. Typical error:: $ LC_ALL=C python3 test.py Traceback (most recent call last): @@ -386,20 +463,18 @@ Very basic example used to illustrate a common issue, display the euro sign Result: -======================== ============================== -Python Always work? -======================== ============================== -Python 2 No -Python 3 No -Python 3.5, POSIX locale No -UTF-8 mode **Yes** -UTF-8 Strict mode **Yes** -======================== ============================== +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 No No +Python 3 No No +Python 3.5, POSIX locale No No +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode **Yes** **Yes** +======================== ============ ========= -"Yes" means that the script cannot fail, but it can produce mojibake. - -"No" means that the script can fail on encoding the euro sign depending on the -locale encoding. +The UTF-8 and UTF-8 Strict modes will always encode the euro sign as +UTF-8. If the terminal uses a different encoding, we get mojibake. Replace a word in a text @@ -414,20 +489,134 @@ reads input from stdin and writes the output into stdout:: Result: -======================== ============================== -Python Always work? -======================== ============================== -Python 2 **Yes** -Python 3 No -Python 3.5, POSIX locale **Yes** -UTF-8 mode **Yes** -UTF-8 Strict mode No -======================== ============================== +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 **Yes** **Yes** +Python 3 No No +Python 3.5, POSIX locale **Yes** **Yes** +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= + +Producer-consumer model using pipes +----------------------------------- + +Let's say that we have a "producer" program which writes data into its +stdout and a "consumer" program which reads data from its stdin. + +On a shell, such programs are run with the command:: + + producer | consumer + +The question if these programs will work with any data and any locale. +UNIX users don't expect Unicode errors, and so expect that such programs +"just works". + +If the producer only produces ASCII output, no error should occur. Let's +say the that producer writes at least one non-ASCII character (at least +one byte in the range ``0x80..0xff``). + +To simplify the problem, let's say that the consumer has no output +(don't write result into a file or stdout). + +A "Bytes producer" is an application which cannot fail with a Unicode +error and produces bytes into stdout. + +Let's say that a "Bytes consumer" does not decode stdin but stores data +as bytes: such consumer always work. Common UNIX command line tools like +``cat``, ``grep`` or ``sed`` are in this category. Many Python 2 +applications are also in this category. + +"Python producer" and "Python consumer" are producer and consumer +implemented in Python. + +Bytes producer, Bytes consumer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +It always work, but it is out of the scope of this PEP since it doesn't +involve Python. + +Python producer, Bytes consumer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Python producer:: + + print("euro: \u20ac") + +Result: + +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 No No +Python 3 No No +Python 3.5, POSIX locale No No +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= + +The question here is not if the consumer is able to decode the input, +but if Python is able to produce its ouput. So it's similar to the +`Display Unicode characters into stdout`_ case. + +UTF-8 modes work with any locale since the consumer doesn't try to +decode its stdin. -"Yes" means that the script cannot fail. +Bytes producer, Python consumer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -"No" means that the script can fail on decoding the input depending on -the locale. +Python consumer:: + + import sys + text = sys.stdin.read() + result = text.replace("apple", "orange") + # ignore the result + +Result: + +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 **Yes** **Yes** +Python 3 No No +Python 3.5, POSIX locale **Yes** **Yes** +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= + +Python 3 fails on decoding stdin depending on the input and the locale. + + +Python producer, Python consumer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Python producer:: + + print("euro: \u20ac") + +Python consumer:: + + import sys + text = sys.stdin.read() + result = text.replace("apple", "orange") + # ignore the result + +Result, same Python version used for the producer and the consumer: + +======================== ============ ========= +Python Always work? Mojibake? +======================== ============ ========= +Python 2 No No +Python 3 No No +Python 3.5, POSIX locale No No +UTF-8 mode **Yes** **Yes** +UTF-8 Strict mode No No +======================== ============ ========= + +This case combines a Python producer with a Python consumer, so the +result is the subset of `Python producer, Bytes consumer`_ and `Bytes +producer, Python consumer`_. Backward Compatibility From 221099d8765125bbd798e869846b005bcca84b47 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 17:04:39 +1000 Subject: [PATCH 09/36] PEP 538: update for PEP 540 & linux-sig feedback - PYTHONALLOWCLOCALE=1 -> PYTHONCOERCECLOCALE=0 - reword the proposed library warning - try all of C.UTF-8, c.utf8 and en_US.UTF-8 - compare and contrast with PEP 540 - new Motivation section showing specific Docker problems - discuss implications of "strict" error handling - define configure options to turn the new behaviour off --- pep-0538.txt | 520 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 445 insertions(+), 75 deletions(-) diff --git a/pep-0538.txt b/pep-0538.txt index af905376a5c..bcff4558c5d 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -15,27 +15,39 @@ Abstract An ongoing challenge with Python 3 on \*nix systems is the conflict between needing to use the configured locale encoding by default for consistency with -other C/C++ components in the same process, and the fact that the standard C -locale (as defined in POSIX:2001) specifies a default encoding of ASCII, which -is entirely inappropriate for the development of networked services in a -multilingual world. - -This PEP proposes that the CPython implementation be changed such that: - -* when used as a library, ``Py_Initialize`` will warn that use of the legacy - ``C`` locale may cause various Unicode compatibility issues -* when used as a standalone binary, CPython will automatically coerce the - ``C`` locale to ``C.UTF-8`` unless the new ``PYTHONALLOWCLOCALE`` environment - variable is set - -With this change, any \*nix platform that does *not* offer the ``C.UTF-8`` -locale as part of its standard configuration will only be considered a -fully supported platform for CPython 3.7+ deployments when a non-ASCII locale -is set explicitly. +other C/C++ components in the same process and those invoked in subprocesses, +and the fact that the standard C locale (as defined in POSIX:2001) specifies +a default text encoding of ASCII, which is entirely inadequate for the +development of networked services and client applications in a multilingual +world. + +This PEP proposes that the way the CPython implementation handles the default +C locale be changed such that: + +* the standalone CPython binary will automatically attempt to coerce the ``C`` + locale to ``C.UTF-8`` (preferred), ``C.utf8`` or ``en_US.UTF-8`` unless the + new ``PYTHONCOERCECLOCALE`` environment variable is set to ``0`` +* if the subsequent runtime initialization process detects that the legacy + ``C`` locale remains active (e.g. locale coercion is disabled, or the runtime + is embedded in an application other than the main CPython binary), it will + emit a warning on stderr that use of the legacy ``C`` locale's default ASCII + text encoding may cause various Unicode compatibility issues + +Explicitly configuring the ``C.UTF-8`` or ``en_US.UTF-8`` locales has already +been used successfully for a number of years (including by the PEP author) to +get Python 3 running reliably in environments where no locale is otherwise +configured (such as Docker containers). + +With this change, any \*nix platform that does *not* offer at least one of the +``C.UTF-8``, ``C.utf8`` or ``en_US.UTF-8`` locales as part of its standard +configuration would only be considered a fully supported platform for CPython +3.7+ deployments when a locale other than the default ``C`` locale is +configured explicitly. Redistributors (such as Linux distributions) with a narrower target audience -may also choose to opt in to this behaviour for earlier Python 3.x releases by -applying the necessary changes as a downstream patch to those versions. +that the upstream CPython development team may also choose to opt in to this +behaviour for the Python 3.6.x series by applying the necessary changes as a +downstream patch when first introducing Python 3.6.0. Background @@ -49,20 +61,29 @@ do the conversion and then ensuring that the text encoding name reported by ``sys.getfilesystemencoding()`` matches the encoding used during this early bootstrapping process. -On Mac OS X, this is straightforward, as Apple guarantees that these operations -will always use UTF-8 to do the conversion. +On Apple platforms (including both Mac OS X and iOS), this is straightforward, +as Apple guarantees that these operations will always use UTF-8 to do the +conversion. On Windows, the limitations of the ``mbcs`` format used by default in these conversions proved sufficiently problematic that PEP 528 and PEP 529 were implemented to bypass the operating system supplied interfaces for binary data handling and force the use of UTF-8 instead. -On non-Apple \*nix systems however, these operations are handled using the C -locale system, which has the following characteristics [4_]: +On Android, the locale settings are of limited relevance (due to most +applications running in the UTF-16-LE based Dalvik environment) and there's +limited value in preserving backwards compatibility with other locale aware +C/C++ components in the same process (since it's a relatively new target +platform for CPython), so CPython bypasses the operating system provided APIs +and hardcodes the use of UTF-8 (similar to its behaviour on Apple platforms). + +On non-Apple and non-Android \*nix systems however, these operations are +handled using the C locale system in glibc, which has the following +characteristics [4_]: * by default, all processes start in the ``C`` locale, which uses ``ASCII`` for these conversions. This is almost never what anyone doing multilingual - text processing actually wants (including CPython) + text processing actually wants (including CPython and C/C++ GUI frameworks). * calling ``setlocale(LC_ALL, "")`` reconfigures the active locale based on the locale categories configured in the current process environment * if the locale requested by the current environment is unknown, or no specific @@ -73,68 +94,336 @@ The specific locale category that covers the APIs that CPython depends on is and to multibyte and wide characters" [5_]. Accordingly, CPython includes the following key calls to ``setlocale``: +* in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to + configure the entire C locale subsystem according to the process environment. + It does this prior to making any calls into the shared CPython library * in ``Py_Initialize``, CPython calls ``setlocale(LC_CTYPE, "")``, such that the configured locale settings for that category *always* match those set in the environment. It does this unconditionally, and it *doesn't* revert the process state change in ``Py_Finalize`` -* in the main ``python`` binary, CPython calls ``setlocale(LC_ALL, "")`` to - configure the entire C locale subsystem according to the process environment. - It does this prior to making any calls into the shared CPython library + +(This summary of the locale handling omits several technical details related +to exactly where and when the text encoding declared as part of the locale +settings is used - see PEP 540 for further discussion, as these particular +details matter more when decoupling CPython from the declared C locale than +they do when overriding the locale with one based on UTF-8) These calls are usually sufficient to provide sensible behaviour, but they can still fail in the following cases: * SSH environment forwarding means that SSH clients will often forward - client locale settings to servers that don't have that locale installed + client locale settings to servers that don't have that locale installed. This + leads to CPython running in the default ASCII-based C locale * some process environments (such as Linux containers) may not have any - explicit locale configured at all - + explicit locale configured at all. As with unknown locales, this leads to + CPython running in the default ASCII-based C locale + +The simplest way to deal with this problem for currently released versions of +CPython is to explicitly set a more sensible locale when launching the +application. For example:: + + LC_ALL=C.UTF-8 LANG=C.UTF-8 python3 ... + +In the specific case of Docker containers and similar technologies, the +appropriate locale setting can be specified directly in the container image +definition. + +Another common failure case is developers specifying ``LANG=C`` in order to +see otherwise translated user interface messages in English, rather than the +more narrowly scoped ``LC_MESSAGES=C``. + + +Relationship with other PEPs +============================ + +This PEP shares a common problem statement with PEP 540 (improving Python 3's +behaviour in the default C locale), but diverges markedly in the proposed +solution: + +* PEP 540 proposes to entirely decouple CPython's default text encoding from + the C locale system in that case, allowing text handling inconsistencies to + arise between CPython and other C/C++ components running in the same process + and in subprocesses. This approach aims to make CPython behave less like a + locale-aware C/C++ application, and more like C/C++ independent language + runtimes like the JVM, .NET CLR, Go, Node.js, and Rust +* this PEP proposes to instead override the legacy C locale with a more recently + defined locale that uses UTF-8 as its default text encoding. This means that + the text encoding override will apply not only to CPython, but also to any + locale aware extension modules loaded into the current process, as well as to + locale aware C/C++ applications invoked in subprocesses that inherit their + environment from the parent process. This approach aims to retain CPython's + traditional strong support for integration with other components written + in C and C++, while actively helping to push forward the adoption and + standardisation of the C.UTF-8 locale as a Unicode-aware replacement for + the legacy C locale + +While the two PEPs present alternate proposed behavioural improvements that +align with the interests of different parts of the Python user community, they +don't actually conflict at a technical level. + +That means it would be entirely possible to implement both of them, and end up +with a situation where redistributors, application integrators, and end users +can choose between: + +* coercing the default ASCII based C locale to a UTF-8 based locale +* instructing CPython to ignore the C locale and use UTF-8 instead +* doing both of the above (with this option as the default legacy C locale + handling) +* forcing use of the default ASCII based C locale by setting both + PYTHONCOERCECLOCALE=0 and PYTHONUTF8=0 + +If this approach was taken, then the proposed modifications to PEP 11 would +be adjusted to indicate that the only unsupported configurations are those where +both the legacy C locale coercion and the C locale text encoding bypass are +disabled. + +Given such a hybrid implementation, it would also be reasonable to drop the +``en_US.UTF-8`` legacy fallback from the list of UTF-8 locales tried as a +coercion target and instead rely solely on the C locale text encoding bypass +in such cases. + + +Motivation +========== -Proposal -======== +While Linux container technologies like Docker, Kubernetes, and OpenShift are +best known for their use in web service development, the related container +formats and execution models are also being adopted for Linux command line +application development. Technologies like Gnome Flatpak [7_] and +Ubunty Snappy [8_] further aim to bring these same techniques to Linux GUI +application development. + +When using Python 3 for application development in +these contexts, it isn't uncommon to see text encoding related errors akin to +the following:: + + $ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")' + Unable to decode the command from the command line: + UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed + $ docker run --rm ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")' + Unable to decode the command from the command line: + UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed + +Even though the same command is likely to work fine when run locally:: + + $ python3 -c 'print("ℙƴ☂ℌøἤ")' + ℙƴ☂ℌøἤ + +The source of the problem can be seen by instead running the ``locale`` command +in the three environments:: + + $ locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG=en_AU.UTF-8 + LC_CTYPE="en_AU.UTF-8" + LC_ALL= + $ docker run --rm fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG= + LC_CTYPE="POSIX" + LC_ALL= + $ docker run --rm ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG= + LANGUAGE= + LC_CTYPE="POSIX" + LC_ALL= + +In this particular example, we can see that the host system locale is set to +"en_AU.UTF-8", so CPython uses UTF-8 as the default text encoding. By contrast, +the base Docker images for Fedora and Debian don't have any specific locale +set, so they use the POSIX locale by default, which is an alias for the +ASCII-based default C locale. + +The simplest way to get Python 3 (regardless of the exact version) to behave +sensibly in Fedora and Debian based containers is to run it in the ``C.UTF-8`` +locale that both distros provide:: + + $ docker run --rm -e LANG=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")' + ℙƴ☂ℌøἤ + $ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")' + ℙƴ☂ℌøἤ + + $ docker run --rm -e LANG=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG=C.UTF-8 + LC_CTYPE="C.UTF-8" + LC_ALL= + $ docker run --rm -e LANG=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG=C.UTF-8 + LANGUAGE= + LC_CTYPE="C.UTF-8" + LC_ALL= + +The Alpine Linux based Python images provided by Docker, Inc, already use the +C.UTF-8 locale by default:: + + $ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")' + ℙƴ☂ℌøἤ + $ docker run --rm python:3 locale | grep -E 'LC_ALL|LC_CTYPE|LANG' + LANG=C.UTF-8 + LANGUAGE= + LC_CTYPE="C.UTF-8" + LC_ALL= + +Similarly, for custom container images (i.e. those adding additional content on +top of a base distro image), a more suitable locale can be set in the image +definition so everything just works by default. However, it would provide a much +nicer and more consistent user experience if CPython were able to just deal +with this problem automatically rather than relying on redistributors or end +users to handle it through system configuration changes. + +While the glibc developers are working towards making the C.UTF-8 locale +universally available for use by glibc based applications like CPython [6_], +this unfortunately doesn't help on platforms that ship older versions of glibc +without that feature, and also don't provide C.UTF-8 as an on-disk locale the +way Debian and Fedora do. For these platforms, the best widely available +fallback option is the ``en_US.UTF-8`` locale, which while still being +unfortunately Anglo-centric, is at least significantly less Anglo-centric than +the ASCII text encoding assumption in the default C locale. + +In the specific case of C locale coercion, the Anglo-centrism implied by the +use of ``en_US.UTF-8`` can be mitigated by configuring only the ``LC_CTYPE`` +locale category, rather than overriding all the locale categories:: + + $ docker run --rm -e LANG=C.UTF-8 centos/python-35-centos7 python3 -c 'print("ℙƴ☂ℌøἤ")' + Unable to decode the command from the command line: + UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed + + $ docker run --rm -e LC_CTYPE=en_US.UTF-8 centos/python-35-centos7 python3 -c 'print("ℙƴ☂ℌøἤ")' + ℙƴ☂ℌøἤ + + +Specification +============= To better handle the cases where CPython would otherwise end up attempting -to operate in the ``C`` locale, this PEP proposes changes to CPython's -behaviour both when it is run as a standalone command line application, as well -as when it is used as a shared library to embed a Python runtime as part of a -larger application. +to operate in the ``C`` locale, this PEP proposes that CPython automatically +attempt to coerce the legacy ``C`` locale to a UTF-8 based locale when it is +run as a standalone command line application. + +It further proposes to emit a warning on stderr if the legacy ``C`` locale +is in effect at the point where the language runtime itself is initialized, +in order to warn system and application integrators that they're running +CPython in an unsupported configuration. + -When ``Py_Initialize`` is called and CPython detects that the configured locale -is the default ``C`` locale, the following warning will be issued:: +Legacy C locale coercion in the standalone Python interpreter binary +-------------------------------------------------------------------- - Py_Initialize detected LC_CTYPE=C, which limits Unicode compatibility. Some - libraries and operating system interfaces may not work correctly. Set - `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar environment - when running Python directly. +When run as a standalone application, CPython has the opportunity to +reconfigure the C locale before any locale dependent operations are executed +in the process. -This warning informs both system and application integrators that they're -running Python 3 in a configuration that we don't expect to work properly. For -the benefit of folks working on maintaining such misconfigured systems, it -also provides instructions on how to deliberately reproduce a comparable -misconfiguration of the standalone command line application. +This means that it can change the locale settings not only for the CPython +runtime, but also for any other C/C++ components running in the current +process (e.g. as part of extension modules), as well as in subprocesses that +inherit their environment from the current process. -By contrast, when CPython *is* the main application, it will instead -automatically coerce the legacy C locale to the multilingual C.UTF-8 locale:: +After calling ``setlocale(LC_ALL, "")`` to initialize the locale settings in +the current process, the main interpreter binary will be updated to include +the following call:: - Python detected LC_CTYPE=C, forcing LC_ALL & LANG to C.UTF-8 (set - PYTHONALLOWCLOCALE to disable this locale coercion behaviour). + const char *ctype_loc = setlocale(LC_CTYPE, NULL); + +This cryptic invocation is the API that C provides to query the current locale +setting without changing it. Given that query, it is possible to check for +exactly the ``C`` locale with ``strcmp``:: + + ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale + +Given this information, CPython can then attempt to coerce the locale to one +that uses UTF-8 rather than ASCII as the default encoding. + +Three such locales will be tried: + +* ``C.UTF-8`` (available at least in Debian, Ubuntu, and Fedora 25+, and + expected to be available by default in a future version of glibc) +* ``C.utf8`` (available at least in HP-UX) +* ``en_US.UTF-8`` (available at least in RHEL and CentOS) + +For ``C.UTF-8`` and ``C.utf8``, the coercion will be implemented by actually +setting the ``LANG`` and ``LC_ALL`` environment variables to the candidate +locale name, such that future calls to ``setlocale()`` will see them, as will +other components looking for those settings (such as GUI development +frameworks). + +The last fallback isn't ideal as a coercion target (as it changes more than +just the default text encoding), but has the benefit of currently being more +widely available than the C.UTF-8 locale. To minimize the chance of side +effects, only the ``LC_CTYPE`` environment variable would be set when using +this legacy fallback option, with the other locale categories being left alone. + +Given time, more environments are expected to provide a ``C.UTF-8`` locale by +default, so falling all the way back to the ``en_US.UTF-8`` option is expected +to become less common. + +When this locale coercion is activated, the following warning will be +printed on stderr, with the warning containing whichever locale was +successfully configured:: + + Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set + PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). + +When falling all the way back to the ``en_US.UTF-8`` locale, the message would +be slightly different:: + + Python detected LC_CTYPE=C, LC_CTYPE set to en_US.UTF-8 (set + PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). This locale coercion will mean that the standard Python binary should once again "just work" in the two main failure cases we're aware of (missing locale settings and SSH forwarding of unknown locales), as long as the target -platform provides the ``C.UTF-8`` locale. +platform provides at least one of the candidate UTF-8 based environments. + +If ``PYTHONCOERCECLOCALE=0`` is set, or none of the candidate locales is +successfully configured, then initialization will continue as usual in the C +locale and the Unicode compatibility warning described in the next section will +be emitted just as it would for any other application. + +The interpreter will always check for the ``PYTHONCOERCECLOCALE`` environment +variable (even when running under the ``-E`` or ``-I`` switches), as the locale +coercion check necessarily takes place before any command line argument +processing. + + +Changes to the runtime initialization process +--------------------------------------------- -This coercion will be implemented by actually setting the ``LANG`` and -``LC_ALL`` environment variables to ``C.UTF-8``, such that future calls to -``setlocale()`` will see them, as will other components looking for those -settings (such as GUI development frameworks). +By the time that ``Py_Initialize`` is called, arbitrary locale-dependent +operations may have taken place in the current process. This means that +by the time it is called, it is *too late* to switch to a different locale - +doing so would introduce inconsistencies in decoded text, even in the context +of the standalone Python interpreter binary. -The locale coercion will be skipped if the ``PYTHONALLOWCLOCALE`` environment -variable is set to a non-empty string. The interpreter will always check for -the ``PYTHONALLOWCLOCALE`` environment variable (even when running under the -``-E`` or ``-I`` switches), as the locale coercion check necessarily takes -place before any command line argument processing. +Accordingly, when ``Py_Initialize`` is called and CPython detects that the +configured locale is still the default ``C`` locale, the following warning will +be issued:: + Python runtime initialized with LC_CTYPE=C (a locale with default ASCII + encoding), which may cause Unicode compatibility problems. Using C.UTF-8 + (if available) as an alternative Unicode-compatible locale is recommended. + +In this case, no actual change will be made to the locale settings. + +Instead, the warning informs both system and application integrators that +they're running Python 3 in a configuration that we don't expect to work +properly. + + +New build-time configuration options +------------------------------------ + +While both of the above behaviours would be enabled by default, they would +also have new associated configuration options and preprocessor definitions +for the benefit of redistributors that want to override those default settings. + +The locale coercion behaviour would be controlled by the flag +``--with[out]-c-locale-coercion``, which would set the ``PY_COERCE_C_LOCALE`` +preprocessor definition. + +The locale warning behaviour would be controlled by the flag +``--with[out]-c-locale-warning``, which would set the ``PY_WARN_ON_C_LOCALE`` +preprocessor definition. + +On platforms where they would have no effect (e.g. Mac OS X, iOS, Android, +Windows) these preprocessor variables would always be undefined. Platform Support Changes ======================== @@ -145,10 +434,11 @@ A new "Legacy C Locale" section will be added to PEP 11 that states: and any Unicode handling issues that occur only in that locale and cannot be reproduced in an appropriately configured non-ASCII locale will be closed as "won't fix" -* as of Python 3.7, \*nix platforms are expected to provide the ``C.UTF-8`` - locale as an alternative to the legacy ``C`` locale. On platforms which don't - yet provide that locale, an explicit non-ASCII locale setting will be needed - to configure a supported environment for running Python 3.7+ +* as of Python 3.7, \*nix platforms are expected to provide at least one of + ``C.UTF-8``, ``C.utf8`` or ``en_US.UTF-8`` as an alternative to the legacy + ``C`` locale. On platforms which don't yet provide any of these locales, an + explicit non-ASCII locale setting will be needed to configure a fully + supported environment for running Python 3.7+ Rationale @@ -177,8 +467,9 @@ C/C++ components sharing the same process, as well as with the user's desktop locale settings, than it is with the emergent conventions of modern network service development. -The premise of this PEP is that for *all* of these use cases, the default "C" -locale is wrong, and furthermore that the following assumptions are valid: +The core premise of this PEP is that for *all* of these use cases, the default +"C" locale is the wrong choice, and furthermore that the following assumptions +are valid: * in desktop application use cases, the process locale will *already* be configured appropriately, and if it isn't, then that is an operating system @@ -191,6 +482,32 @@ locale is wrong, and furthermore that the following assumptions are valid: default encoding of ASCII the way CPython currently does +Using "strict" error handling by default +---------------------------------------- + +By coercing the locale away from the legacy C default and its assumption of +ASCII as the preferred text encoding, this PEP also disables the implicit use +of the "surrogateescape" error handler on the standard IO streams that was +introduced in Python 3.5. + +This is deliberate, as while UTF-8 as the preferred text encoding is a good +working assumption for network service development and for more recent releases +of client operating systems, it still isn't a universally valid assumption. + +In particular, GB 18030 [12_] is a Chinese national text encoding standard +that handles all Unicode code points, but is incompatible with both ASCII and +UTF-8. + +Similarly, Shift-JIS [13_] and ISO-2022-JP [14_] remain in widespread use in +Japan, and are incompatible with both ASCII and UTF-8. + +Using strict error handling on the standard streams means that attempting to +pass information from a host system using one of these encodings into a +container application that is assuming the use of UTF-8 or vice-versa is likely +to cause an immediate Unicode encoding or decoding error, rather than +potentially causing silent data corruption. + + Dropping official support for Unicode handling in the legacy C locale --------------------------------------------------------------------- @@ -199,8 +516,8 @@ legacy C locale for over a decade at this point. Not only haven't we been able to get it to work, neither has anyone else - the only viable alternatives identified have been to pass the bytes along verbatim without eagerly decoding them to text (Python 2.x, Ruby, etc), or else to ignore the nominal C/C++ locale -encoding entirely and assume the use of either UTF-8 (Rust, Go, Node.js, etc) -or UTF-16-LE (JVM, .NET CLR). +encoding entirely and assume the use of either UTF-8 (PEP 540, Rust, Go, +Node.js, etc) or UTF-16-LE (JVM, .NET CLR). While this PEP ensures that developers that need to do so can still opt-in to running their Python code in the legacy C locale, it also makes clear that we @@ -283,6 +600,11 @@ runtimes even when running a version with this change applied. Implementation ============== +NOTE: The currently posted draft implementation is for a previous iteration +of the PEP prior to the incorporation of the feedback noted in [11_]. It was +broadly the same in concept (i.e. coercing the legacy C locale to one based on +UTF-8), but differs in several details. + A draft implementation of the change (including test cases) has been posted to issue 28180 [1_], which is an end user request that ``sys.getfilesystemencoding()`` default to ``utf-8`` rather than ``ascii``. @@ -291,12 +613,27 @@ posted to issue 28180 [1_], which is an end user request that Backporting to earlier Python 3 releases ======================================== -If this PEP is accepted for Python 3.7, backporting of the change to earlier -Python 3 releases by redistributors will be both allowed and encouraged. -However, to serve any useful purpose, such backports should only be undertaken -either in conjunction with the changes needed to also provide the C.UTF-8 -locale by default, or else specifically for platforms where that locale is -already consistently available. +Backporting to Python 3.6.0 +--------------------------- + +If this PEP is accepted for Python 3.7, redistributors backporting the change +specifically to their initial Python 3.6.0 release will be both allowed and +encouraged. However, such backports should only be undertaken either in +conjunction with the changes needed to also provide the C.UTF-8 locale by +default, or else specifically for platforms where that locale is already +consistently available. + + +Backporting to other 3.x releases +--------------------------------- + +While the proposed behavioural change is seen primarily as a bug fix addressing +Python 3's current misbehaviour in the default ASCII-based C locale, it still +represents a reasonable significant change in the way CPython interacts with +the C locale system. As such, while some redistributors may still choose to +backport it to even earlier Python 3.x releases based on the needs and +interests of their particular user base, this wouldn't be encouraged as a +general practice. Acknowledgements @@ -325,6 +662,13 @@ The change was originally proposed as a downstream patch for Fedora's system Python 3.6 package [3_], and then reformulated as a PEP for Python 3.7 with a section allowing for backports to earlier versions by redistributors. +The initial draft was posted to the Python Linux SIG for discussion [10_] and +then amended based on both that discussion and Victor Stinner's work in +PEP 540 [11_]. + +The "ℙƴ☂ℌøἤ" string used in the Unicode handling examples throughout this PEP +is taken from Ned Batchelder's excellent "Pragmatic Unicode" presentation [9_]. + References ========== @@ -344,6 +688,32 @@ References .. [5] GNU C: Locale Categories (https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html) +.. [6] glibc C.UTF-8 locale proposal + (https://sourceware.org/glibc/wiki/Proposals/C.UTF-8) + +.. [7] GNOME Flatpak + (http://flatpak.org/) + +.. [8] Ubuntu Snappy + (https://www.ubuntu.com/desktop/snappy) + +.. [9] Pragmatic Unicode + (http://nedbatchelder.com/text/unipain.html) + +.. [10] linux-sig discussion of initial PEP draft + (https://mail.python.org/pipermail/linux-sig/2017-January/000014.html) + +.. [11] Feedback notes from linux-sig discussion and PEP 540 + (https://github.com/python/peps/issues/171) + +.. [12] GB 18030 + (https://en.wikipedia.org/wiki/GB_18030) + +.. [13] Shift-JIS + (https://en.wikipedia.org/wiki/Shift_JIS) + +.. [14] ISO-2022 + (https://en.wikipedia.org/wiki/ISO/IEC_2022) Copyright ========= From 61a36c94e2d0de16562da99d17a60331ef1f2f59 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 17:34:43 +1000 Subject: [PATCH 10/36] PEP 538: fix typo --- pep-0538.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0538.txt b/pep-0538.txt index bcff4558c5d..b84df416997 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -45,7 +45,7 @@ configuration would only be considered a fully supported platform for CPython configured explicitly. Redistributors (such as Linux distributions) with a narrower target audience -that the upstream CPython development team may also choose to opt in to this +than the upstream CPython development team may also choose to opt in to this behaviour for the Python 3.6.x series by applying the necessary changes as a downstream patch when first introducing Python 3.6.0. From 34d69bad02bb9da77a1281337175bb54c51c3532 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 18:19:44 +1000 Subject: [PATCH 11/36] PEP 538: cite all 3 common failure modes --- pep-0538.txt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/pep-0538.txt b/pep-0538.txt index b84df416997..7e432a43d41 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -368,9 +368,10 @@ be slightly different:: PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). This locale coercion will mean that the standard Python binary should once -again "just work" in the two main failure cases we're aware of (missing locale -settings and SSH forwarding of unknown locales), as long as the target -platform provides at least one of the candidate UTF-8 based environments. +again "just work" in the three main failure cases we're aware of (missing locale +settings, SSH forwarding of unknown locales, and developers explicitly +requesting ``LANG=C``), as long as the target platform provides at least one +of the candidate UTF-8 based environments. If ``PYTHONCOERCECLOCALE=0`` is set, or none of the candidate locales is successfully configured, then initialization will continue as usual in the C @@ -425,6 +426,7 @@ preprocessor definition. On platforms where they would have no effect (e.g. Mac OS X, iOS, Android, Windows) these preprocessor variables would always be undefined. + Platform Support Changes ======================== From 858909a1d2791122f84663682290b8273531641f Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 22:02:17 +1000 Subject: [PATCH 12/36] PEP 538: Add post history --- pep-0538.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pep-0538.txt b/pep-0538.txt index 7e432a43d41..f4d5a1e9acc 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -8,6 +8,8 @@ Type: Standards Track Content-Type: text/x-rst Created: 28-Dec-2016 Python-Version: 3.7 +Post-History: 03-Jan-2017 (linux-sig), + 07-Jan-2017 (python-ideas) Abstract From c2aa424e2829527dd6bb878c14b8e7af80690f7c Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 22:14:20 +1000 Subject: [PATCH 13/36] PEP 538: clarify setlocale behaviour in POSIX locale --- pep-0538.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pep-0538.txt b/pep-0538.txt index f4d5a1e9acc..011a2243e55 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -330,6 +330,9 @@ exactly the ``C`` locale with ``strcmp``:: ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale +This call also returns ``"C"`` when either no particular locale is set, or the +nominal locale is set to an alias for the ``C`` locale (such as ``POSIX``). + Given this information, CPython can then attempt to coerce the locale to one that uses UTF-8 rather than ASCII as the default encoding. From e788aa908481074e91dc6436d14adddd2dd03417 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sat, 7 Jan 2017 22:20:23 +1000 Subject: [PATCH 14/36] PEP 538: tweak wording of proposed warnings --- pep-0538.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pep-0538.txt b/pep-0538.txt index 011a2243e55..6376ce16b87 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -363,14 +363,14 @@ When this locale coercion is activated, the following warning will be printed on stderr, with the warning containing whichever locale was successfully configured:: - Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set - PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). + Python detected LC_CTYPE=C, LC_ALL & LANG set to C.UTF-8 (set another + locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). When falling all the way back to the ``en_US.UTF-8`` locale, the message would be slightly different:: - Python detected LC_CTYPE=C, LC_CTYPE set to en_US.UTF-8 (set - PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). + Python detected LC_CTYPE=C, LC_CTYPE set to en_US.UTF-8 (set another locale + or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour). This locale coercion will mean that the standard Python binary should once again "just work" in the three main failure cases we're aware of (missing locale From 927e704d7e9a439fe127fcefbbb5330adfc49c4a Mon Sep 17 00:00:00 2001 From: Mariatta Date: Sat, 7 Jan 2017 10:33:00 -0800 Subject: [PATCH 15/36] reSTify 10 PEPs (#174) Continuing work on #4 --- pep-0160.txt | 82 ++++++++++++++++++--------------- pep-0210.txt | 12 +++-- pep-0217.txt | 63 ++++++++++++++----------- pep-0220.txt | 27 ++++++----- pep-0233.txt | 127 +++++++++++++++++++++++++++++---------------------- pep-0251.txt | 100 ++++++++++++++++++++++------------------ pep-0254.txt | 27 +++++++---- pep-0260.txt | 92 ++++++++++++++++++++----------------- pep-0270.txt | 85 +++++++++++++++++++--------------- pep-0271.txt | 70 +++++++++++++++------------- 10 files changed, 384 insertions(+), 301 deletions(-) diff --git a/pep-0160.txt b/pep-0160.txt index af55bddf3c0..d5639ad2867 100644 --- a/pep-0160.txt +++ b/pep-0160.txt @@ -5,73 +5,79 @@ Last-Modified: $Date$ Author: Fred L. Drake, Jr. Status: Final Type: Informational +Content-Type: text/x-rst Created: 25-Jul-2000 Python-Version: 1.6 Post-History: Introduction +============ - This PEP describes the Python 1.6 release schedule. The CVS - revision history of this file contains the definitive historical - record. +This PEP describes the Python 1.6 release schedule. The CVS +revision history of this file contains the definitive historical +record. - This release will be produced by BeOpen PythonLabs staff for the - Corporation for National Research Initiatives (CNRI). +This release will be produced by BeOpen PythonLabs staff for the +Corporation for National Research Initiatives (CNRI). Schedule +======== - August 1 1.6 beta 1 release (planned). - August 3 1.6 beta 1 release (actual). - August 15 1.6 final release (planned). - September 5 1.6 final release (actual). +* August 1: 1.6 beta 1 release (planned). +* August 3: 1.6 beta 1 release (actual). +* August 15: 1.6 final release (planned). +* September 5: 1.6 final release (actual). Features +======== - A number of features are required for Python 1.6 in order to - fulfill the various promises that have been made. The following - are required to be fully operational, documented, and forward - compatible with the plans for Python 2.0: +A number of features are required for Python 1.6 in order to +fulfill the various promises that have been made. The following +are required to be fully operational, documented, and forward +compatible with the plans for Python 2.0: - * Unicode support: The Unicode object defined for Python 2.0 must - be provided, including all methods and codec support. +* Unicode support: The Unicode object defined for Python 2.0 must be provided, + including all methods and codec support. - * SRE: Fredrik Lundh's new regular expression engine will be used - to provide support for both 8-bit strings and Unicode strings. - It must pass the regression test used for the pcre-based version - of the re module. +* SRE: Fredrik Lundh's new regular expression engine will be used + to provide support for both 8-bit strings and Unicode strings. It must pass + the regression test used for the pcre-based version of the re module. - * The curses module was in the middle of a transformation to a - package, so the final form was adopted. +* The curses module was in the middle of a transformation to a package, so the + final form was adopted. Mechanism +========= - The release will be created as a branch from the development tree - rooted at CNRI's close of business on 16 May 2000. Patches - required from more recent checkins will be merged in by moving the - branch tag on individual files whenever possible in order to - reduce mailing list clutter and avoid divergent and incompatible - implementations. +The release will be created as a branch from the development tree +rooted at CNRI's close of business on 16 May 2000. Patches +required from more recent checkins will be merged in by moving the +branch tag on individual files whenever possible in order to +reduce mailing list clutter and avoid divergent and incompatible +implementations. - The branch tag is "cnri-16-start". +The branch tag is "cnri-16-start". - Patches and features will be merged to the extent required to pass - regression tests in effect on 16 May 2000. +Patches and features will be merged to the extent required to pass +regression tests in effect on 16 May 2000. - The beta release is tagged "r16b1" in the CVS repository, and the - final Python 1.6 release is tagged "release16" in the repository. +The beta release is tagged "r16b1" in the CVS repository, and the +final Python 1.6 release is tagged "release16" in the repository. Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0210.txt b/pep-0210.txt index 0d4d6c1aa0b..986a429ae82 100644 --- a/pep-0210.txt +++ b/pep-0210.txt @@ -5,13 +5,15 @@ Last-Modified: $Date$ Author: davida@activestate.com (David Ascher) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 15-Jul-2000 Python-Version: 2.1 Post-History: - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0217.txt b/pep-0217.txt index 908af681f0d..13f2227e0e4 100644 --- a/pep-0217.txt +++ b/pep-0217.txt @@ -5,38 +5,44 @@ Last-Modified: $Date$ Author: moshez@zadka.site.co.il (Moshe Zadka) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 31-Jul-2000 Python-Version: 2.1 -Post-History: +Post-History: Abstract +======== + +Python's interactive mode is one of the implementation's great +strengths -- being able to write expressions on the command line +and get back a meaningful output. However, the output function +cannot be all things to all people, and the current output +function too often falls short of this goal. This PEP describes a +way to provides alternatives to the built-in display function in +Python, so users will have control over the output from the +interactive interpreter. - Python's interactive mode is one of the implementation's great - strengths -- being able to write expressions on the command line - and get back a meaningful output. However, the output function - cannot be all things to all people, and the current output - function too often falls short of this goal. This PEP describes a - way to provides alternatives to the built-in display function in - Python, so users will have control over the output from the - interactive interpreter. Interface +========= + +The current Python solution has worked for many users, and this +should not break it. Therefore, in the default configuration, +nothing will change in the REPL loop. To change the way the +interpreter prints interactively entered expressions, users +will have to rebind ``sys.displayhook`` to a callable object. +The result of calling this object with the result of the +interactively entered expression should be print-able, +and this is what will be printed on ``sys.stdout``. - The current Python solution has worked for many users, and this - should not break it. Therefore, in the default configuration, - nothing will change in the REPL loop. To change the way the - interpreter prints interactively entered expressions, users - will have to rebind sys.displayhook to a callable object. - The result of calling this object with the result of the - interactively entered expression should be print-able, - and this is what will be printed on sys.stdout. Solution +======== - The bytecode PRINT_EXPR will call sys.displayhook(POP()) - A displayhook() will be added to the sys builtin module, which is - equivalent to +The bytecode ``PRINT_EXPR`` will call ``sys.displayhook(POP())``. +A ``displayhook()`` will be added to the sys builtin module, which is +equivalent to:: import __builtin__ def displayhook(o): @@ -45,13 +51,16 @@ Solution __builtin__._ = None print `o` __builtin__._ = o - + + Jython Issues +============= + +The method ``Py.printResult`` will be similarly changed. - The method Py.printResult will be similarly changed. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0220.txt b/pep-0220.txt index 40e2a0b13d0..597cc4715ff 100644 --- a/pep-0220.txt +++ b/pep-0220.txt @@ -5,23 +5,26 @@ Last-Modified: $Date$ Author: gmcm@hypernet.com (Gordon McMillan) Status: Rejected Type: Informational +Content-Type: text/x-rst Created: 14-Aug-2000 Post-History: Abstract +======== - Demonstrates why the changes described in the stackless PEP are - desirable. A low-level continuations module exists. With it, - coroutines and generators and "green" threads can be written. A - higher level module that makes coroutines and generators easy to - create is desirable (and being worked on). The focus of this PEP - is on showing how coroutines, generators, and green threads can - simplify common programming problems. +Demonstrates why the changes described in the stackless PEP are +desirable. A low-level continuations module exists. With it, +coroutines and generators and "green" threads can be written. A +higher level module that makes coroutines and generators easy to +create is desirable (and being worked on). The focus of this PEP +is on showing how coroutines, generators, and green threads can +simplify common programming problems. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0233.txt b/pep-0233.txt index 93a1dee37ab..7215566090b 100644 --- a/pep-0233.txt +++ b/pep-0233.txt @@ -5,95 +5,114 @@ Last-Modified: $Date$ Author: paul@prescod.net (Paul Prescod) Status: Deferred Type: Standards Track +Content-Type: text/x-rst Created: 11-Dec-2000 Python-Version: 2.1 Post-History: Abstract +======== - This PEP describes a command-line driven online help facility for - Python. The facility should be able to build on existing - documentation facilities such as the Python documentation and - docstrings. It should also be extensible for new types and - modules. +This PEP describes a command-line driven online help facility for +Python. The facility should be able to build on existing +documentation facilities such as the Python documentation and +docstrings. It should also be extensible for new types and +modules. -Interactive use: +Interactive use +=============== - Simply typing "help" describes the help function (through repr() - overloading). +Simply typing "help" describes the help function (through ``repr()`` +overloading). - "help" can also be used as a function: +"help" can also be used as a function. - The function takes the following forms of input: +The function takes the following forms of input:: - help( "string" ) -- built-in topic or global - help( ) -- docstring from object or type - help( "doc:filename" ) -- filename from Python documentation + help( "string" ) -- built-in topic or global + help( ) -- docstring from object or type + help( "doc:filename" ) -- filename from Python documentation - If you ask for a global, it can be a fully-qualified name such as - help("xml.dom"). +If you ask for a global, it can be a fully-qualified name, such as:: - You can also use the facility from a command-line + help("xml.dom") + +You can also use the facility from a command-line:: python --help if - In either situation, the output does paging similar to the "more" - command. +In either situation, the output does paging similar to the "more" +command. Implementation +============== - The help function is implemented in an onlinehelp module which is - demand-loaded. +The help function is implemented in an ``onlinehelp`` module which is +demand-loaded. - There should be options for fetching help information from - environments other than the command line through the onlinehelp - module: +There should be options for fetching help information from +environments other than the command line through the ``onlinehelp`` +module:: - onlinehelp.gethelp(object_or_string) -> string + onlinehelp.gethelp(object_or_string) -> string - It should also be possible to override the help display function - by assigning to onlinehelp.displayhelp(object_or_string). +It should also be possible to override the help display function +by assigning to ``onlinehelp``.displayhelp(object_or_string). - The module should be able to extract module information from - either the HTML or LaTeX versions of the Python documentation. - Links should be accommodated in a "lynx-like" manner. +The module should be able to extract module information from +either the HTML or LaTeX versions of the Python documentation. +Links should be accommodated in a "lynx-like" manner. - Over time, it should also be able to recognize when docstrings are - in "special" syntaxes like structured text, HTML and LaTeX and - decode them appropriately. +Over time, it should also be able to recognize when docstrings are +in "special" syntaxes like structured text, HTML and LaTeX and +decode them appropriately. - A prototype implementation is available with the Python source - distribution as nondist/sandbox/doctools/onlinehelp.py. +A prototype implementation is available with the Python source +distribution as nondist/sandbox/doctools/``onlinehelp``.py. Built-in Topics +=============== + +help( "intro" ) - What is Python? Read this first! + +help( "keywords" ) - What are the keywords? + +help( "syntax" ) - What is the overall syntax? + +help( "operators" ) - What operators are available? + +help( "builtins" ) - What functions, types, etc. are built-in? + +help( "modules" ) - What modules are in the standard library? - help( "intro" ) - What is Python? Read this first! - help( "keywords" ) - What are the keywords? - help( "syntax" ) - What is the overall syntax? - help( "operators" ) - What operators are available? - help( "builtins" ) - What functions, types, etc. are built-in? - help( "modules" ) - What modules are in the standard library? - help( "copyright" ) - Who owns Python? - help( "moreinfo" ) - Where is there more information? - help( "changes" ) - What changed in Python 2.0? - help( "extensions" ) - What extensions are installed? - help( "faq" ) - What questions are frequently asked? - help( "ack" ) - Who has done work on Python lately? +help( "copyright" ) - Who owns Python? + +help( "moreinfo" ) - Where is there more information? + +help( "changes" ) - What changed in Python 2.0? + +help( "extensions" ) - What extensions are installed? + +help( "faq" ) - What questions are frequently asked? + +help( "ack" ) - Who has done work on Python lately? Security Issues +=============== + +This module will attempt to import modules with the same names as +requested topics. Don't use the modules if you are not confident +that everything in your ``PYTHONPATH`` is from a trusted source. - This module will attempt to import modules with the same names as - requested topics. Don't use the modules if you are not confident - that everything in your PYTHONPATH is from a trusted source. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0251.txt b/pep-0251.txt index 08587b575d3..c7de8ad4ff0 100644 --- a/pep-0251.txt +++ b/pep-0251.txt @@ -5,85 +5,95 @@ Last-Modified: $Date$ Author: barry@python.org (Barry Warsaw), guido@python.org (Guido van Rossum) Status: Final Type: Informational +Content-Type: text/x-rst Created: 17-Apr-2001 Python-Version: 2.2 Post-History: 14-Aug-2001 + Abstract +======== - This document describes the Python 2.2 development and release - schedule. The schedule primarily concerns itself with PEP-sized - items. Small bug fixes and changes will occur up until the first - beta release. +This document describes the Python 2.2 development and release +schedule. The schedule primarily concerns itself with PEP-sized +items. Small bug fixes and changes will occur up until the first +beta release. - The schedule below represents the actual release dates of Python - 2.2. Note that any subsequent maintenance releases of Python 2.2 - should be covered by separate PEPs. +The schedule below represents the actual release dates of Python +2.2. Note that any subsequent maintenance releases of Python 2.2 +should be covered by separate PEPs. Release Schedule +================ - Tentative future release dates. Note that we've slipped this - compared to the schedule posted around the release of 2.2a1. +Tentative future release dates. Note that we've slipped this +compared to the schedule posted around the release of 2.2a1. - 21-Dec-2001: 2.2 [Released] (final release) - 14-Dec-2001: 2.2c1 [Released] - 14-Nov-2001: 2.2b2 [Released] - 19-Oct-2001: 2.2b1 [Released] - 28-Sep-2001: 2.2a4 [Released] - 7-Sep-2001: 2.2a3 [Released] - 22-Aug-2001: 2.2a2 [Released] - 18-Jul-2001: 2.2a1 [Released] +* 21-Dec-2001: 2.2 [Released] (final release) +* 14-Dec-2001: 2.2c1 [Released] +* 14-Nov-2001: 2.2b2 [Released] +* 19-Oct-2001: 2.2b1 [Released] +* 28-Sep-2001: 2.2a4 [Released] +* 7-Sep-2001: 2.2a3 [Released] +* 22-Aug-2001: 2.2a2 [Released] +* 18-Jul-2001: 2.2a1 [Released] Release Manager +=============== - Barry Warsaw was the Python 2.2 release manager. +Barry Warsaw was the Python 2.2 release manager. Release Mechanics +================= - We experimented with a new mechanism for releases: a week before - every alpha, beta or other release, we forked off a branch which - became the release. Changes to the branch are limited to the - release manager and his designated 'bots. This experiment was - deemed a success and should be observed for future releases. See - PEP 101 for the actual release mechanics[1]. +We experimented with a new mechanism for releases: a week before +every alpha, beta or other release, we forked off a branch which +became the release. Changes to the branch are limited to the +release manager and his designated 'bots. This experiment was +deemed a success and should be observed for future releases. See +PEP 101 for the actual release mechanics [1]_. New features for Python 2.2 +=========================== - The following new features are introduced in Python 2.2. For a - more detailed account, see Misc/NEWS[2] in the Python - distribution, or Andrew Kuchling's "What's New in Python 2.2" - document[3]. +The following new features are introduced in Python 2.2. For a +more detailed account, see Misc/NEWS [2]_ in the Python +distribution, or Andrew Kuchling's "What's New in Python 2.2" +document [3]_. - - iterators (PEP 234) - - generators (PEP 255) - - unifying long ints and plain ints (PEP 237) - - division (PEP 238) - - unification of types and classes (PEP 252, PEP 253) +- iterators (PEP 234) +- generators (PEP 255) +- unifying long ints and plain ints (PEP 237) +- division (PEP 238) +- unification of types and classes (PEP 252, PEP 253) References +========== - [1] PEP 101, Doing Python Releases 101 - http://www.python.org/dev/peps/pep-0101/ +.. [1] PEP 101, Doing Python Releases 101 + http://www.python.org/dev/peps/pep-0101/ - [2] Misc/NEWS file from CVS - http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Misc/NEWS?rev=1.337.2.4&content-type=text/vnd.viewcvs-markup +.. [2] Misc/NEWS file from CVS + http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Misc/NEWS?rev=1.337.2.4&content-type=text/vnd.viewcvs-markup - [3] Andrew Kuchling, What's New in Python 2.2 - http://www.python.org/doc/2.2.1/whatsnew/whatsnew22.html +.. [3] Andrew Kuchling, What's New in Python 2.2 + http://www.python.org/doc/2.2.1/whatsnew/whatsnew22.html Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0254.txt b/pep-0254.txt index c0c119ceefa..baa0b487e62 100644 --- a/pep-0254.txt +++ b/pep-0254.txt @@ -5,27 +5,34 @@ Last-Modified: $Date$ Author: guido@python.org (Guido van Rossum) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 18-June-2001 Python-Version: 2.2 Post-History: + Abstract +======== + +This PEP has not been written yet. Watch this space! - This PEP has not been written yet. Watch this space! Status +====== - This PEP was a stub entry and eventually abandoned without having - been filled-out. Substantially most of the intended functionality - was implemented in Py2.2 with new-style types and classes. +This PEP was a stub entry and eventually abandoned without having +been filled-out. Substantially most of the intended functionality +was implemented in Py2.2 with new-style types and classes. Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0260.txt b/pep-0260.txt index 57a0fe073f7..529f8a936be 100644 --- a/pep-0260.txt +++ b/pep-0260.txt @@ -5,81 +5,91 @@ Last-Modified: $Date$ Author: guido@python.org (Guido van Rossum) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 26-Jun-2001 Python-Version: 2.2 Post-History: 26-Jun-2001 + Abstract +======== - This PEP proposes to strip the xrange() object from some rarely - used behavior like x[i:j] and x*n. +This PEP proposes to strip the ``xrange()`` object from some rarely +used behavior like ``x[i:j]`` and ``x*n``. Problem +======= - The xrange() function has one idiomatic use: +The ``xrange()`` function has one idiomatic use:: - for i in xrange(...): ... + for i in xrange(...): ... - However, the xrange() object has a bunch of rarely used behaviors - that attempt to make it more sequence-like. These are so rarely - used that historically they have has serious bugs (e.g. off-by-one - errors) that went undetected for several releases. +However, the xrange() object has a bunch of rarely used behaviors +that attempt to make it more sequence-like. These are so rarely +used that historically they have has serious bugs (e.g. off-by-one +errors) that went undetected for several releases. - I claim that it's better to drop these unused features. This will - simplify the implementation, testing, and documentation, and - reduce maintenance and code size. +I claim that it's better to drop these unused features. This will +simplify the implementation, testing, and documentation, and +reduce maintenance and code size. Proposed Solution +================= - I propose to strip the xrange() object to the bare minimum. The - only retained sequence behaviors are x[i], len(x), and repr(x). - In particular, these behaviors will be dropped: +I propose to strip the `xrange()` object to the bare minimum. The +only retained sequence behaviors are x[i], len(x), and repr(x). +In particular, these behaviors will be dropped:: - x[i:j] (slicing) - x*n, n*x (sequence-repeat) - cmp(x1, x2) (comparisons) - i in x (containment test) - x.tolist() method - x.start, x.stop, x.step attributes + x[i:j] (slicing) + x*n, n*x (sequence-repeat) + cmp(x1, x2) (comparisons) + i in x (containment test) + x.tolist() method + x.start, x.stop, x.step attributes - I also propose to change the signature of the PyRange_New() C API - to remove the 4th argument (the repetition count). +I also propose to change the signature of the `PyRange_New()` C API +to remove the 4th argument (the repetition count). - By implementing a custom iterator type, we could speed up the - common use, but this is optional (the default sequence iterator - does just fine). +By implementing a custom iterator type, we could speed up the +common use, but this is optional (the default sequence iterator +does just fine). Scope +===== - This PEP affects the xrange() built-in function and the - PyRange_New() C API. +This PEP affects the `xrange()` built-in function and the +`PyRange_New()` C API. Risks +===== - Somebody's code could be relying on the extended code, and this - code would break. However, given that historically bugs in the - extended code have gone undetected for so long, it's unlikely that - much code is affected. +Somebody's code could be relying on the extended code, and this +code would break. However, given that historically bugs in the +extended code have gone undetected for so long, it's unlikely that +much code is affected. Transition +========== - For backwards compatibility, the existing functionality will still - be present in Python 2.2, but will trigger a warning. A year - after Python 2.2 final is released (probably in 2.4) the - functionality will be ripped out. +For backwards compatibility, the existing functionality will still +be present in Python 2.2, but will trigger a warning. A year +after Python 2.2 final is released (probably in 2.4) the +functionality will be ripped out. Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0270.txt b/pep-0270.txt index e1af1e0f2ee..8083a383d55 100644 --- a/pep-0270.txt +++ b/pep-0270.txt @@ -5,79 +5,88 @@ Last-Modified: $Date$ Author: jp@demonseed.net (Jason Petrone) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 21-Aug-2001 Python-Version: 2.2 Post-History: Notice +====== - This PEP is withdrawn by the author. He writes: +This PEP is withdrawn by the author. He writes:: - Removing duplicate elements from a list is a common task, but - there are only two reasons I can see for making it a built-in. - The first is if it could be done much faster, which isn't the - case. The second is if it makes it significantly easier to - write code. The introduction of sets.py eliminates this - situation since creating a sequence without duplicates is just - a matter of choosing a different data structure: a set instead - of a list. + Removing duplicate elements from a list is a common task, but + there are only two reasons I can see for making it a built-in. + The first is if it could be done much faster, which isn't the + case. The second is if it makes it significantly easier to + write code. The introduction of sets.py eliminates this + situation since creating a sequence without duplicates is just + a matter of choosing a different data structure: a set instead + of a list. - As described in PEP 218, sets are being added to the standard - library for Python 2.3. +As described in PEP 218, sets are being added to the standard +library for Python 2.3. Abstract +======== - This PEP proposes adding a method for removing duplicate elements to - the list object. +This PEP proposes adding a method for removing duplicate elements to +the list object. Rationale +========= - Removing duplicates from a list is a common task. I think it is - useful and general enough to belong as a method in list objects. - It also has potential for faster execution when implemented in C, - especially if optimization using hashing or sorted cannot be used. +Removing duplicates from a list is a common task. I think it is +useful and general enough to belong as a method in list objects. +It also has potential for faster execution when implemented in C, +especially if optimization using hashing or sorted cannot be used. - On comp.lang.python there are many, many, posts[1] asking about - the best way to do this task. Its a little tricky to implement - optimally and it would be nice to save people the trouble of - figuring it out themselves. +On comp.lang.python there are many, many, posts [1]_ asking about +the best way to do this task. It's a little tricky to implement +optimally and it would be nice to save people the trouble of +figuring it out themselves. Considerations +============== - Tim Peters suggests trying to use a hash table, then trying to - sort, and finally falling back on brute force[2]. Should uniq - maintain list order at the expense of speed? +Tim Peters suggests trying to use a hash table, then trying to +sort, and finally falling back on brute force [2]_. Should uniq +maintain list order at the expense of speed? - Is it spelled 'uniq' or 'unique'? +Is it spelled 'uniq' or 'unique'? Reference Implementation +======================== - I've written the brute force version. Its about 20 lines of code - in listobject.c. Adding support for hash table and sorted - duplicate removal would only take another hour or so. +I've written the brute force version. It's about 20 lines of code +in listobject.c. Adding support for hash table and sorted +duplicate removal would only take another hour or so. References +========== - [1] http://groups.google.com/groups?as_q=duplicates&as_ugroup=comp.lang.python +.. [1] http://groups.google.com/groups?as_q=duplicates&as_ugroup=comp.lang.python - [2] Tim Peters unique() entry in the Python cookbook: - http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560/index_txt +.. [2] Tim Peters unique() entry in the Python cookbook:: + http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560/index_txt Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -fill-column: 70 -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: diff --git a/pep-0271.txt b/pep-0271.txt index 6ffd4caa33a..0cfff4172ee 100644 --- a/pep-0271.txt +++ b/pep-0271.txt @@ -2,76 +2,84 @@ PEP: 271 Title: Prefixing sys.path by command line option Version: $Revision$ Last-Modified: $Date$ -Author: fred@arakne.com (Frédéric B. Giacometti) +Author: fred@arakne.com (Frédéric B. Giacometti) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 15-Aug-2001 Python-Version: 2.2 Post-History: Abstract +======== - At present, setting the PYTHONPATH environment variable is the - only method for defining additional Python module search - directories. +At present, setting the ``PYTHONPATH`` environment variable is the +only method for defining additional Python module search +directories. - This PEP introduces the '-P' valued option to the python command - as an alternative to PYTHONPATH. +This PEP introduces the '-P' valued option to the python command +as an alternative to ``PYTHONPATH``. Rationale +========= - On Unix: +On Unix:: - python -P $SOMEVALUE + python -P $SOMEVALUE - will be equivalent to +will be equivalent to:: - env PYTHONPATH=$SOMEVALUE python + env PYTHONPATH=$SOMEVALUE python - On Windows 2K: +On Windows 2K:: - python -P %SOMEVALUE% + python -P %SOMEVALUE% - will (almost) be equivalent to +will (almost) be equivalent to:: + + set __PYTHONPATH=%PYTHONPATH% && set PYTHONPATH=%SOMEVALUE%\ + && python && set PYTHONPATH=%__PYTHONPATH% - set __PYTHONPATH=%PYTHONPATH% && set PYTHONPATH=%SOMEVALUE%\ - && python && set PYTHONPATH=%__PYTHONPATH% - Other Information +================= - This option is equivalent to the 'java -classpath' option. +This option is equivalent to the 'java -classpath' option. When to use this option +======================= - This option is intended to ease and make more robust the use of - Python in test or build scripts, for instance. +This option is intended to ease and make more robust the use of +Python in test or build scripts, for instance. Reference Implementation +======================== - A patch implementing this is available from SourceForge: +A patch implementing this is available from SourceForge:: http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=6916&aid=429614 - - with the patch discussion at: + +with the patch discussion at:: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=429614&group_id=5470 Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From 1f1abb3b6a50b1019a50795181a63a18e9258de7 Mon Sep 17 00:00:00 2001 From: Nick Coghlan Date: Sun, 8 Jan 2017 11:54:24 +1000 Subject: [PATCH 16/36] PEP 538: document core design principles Also provides a bit more background on the rationale for using "strict" by default on stdin and stdout when coercing the locale to one based on UTF-8 --- pep-0538.txt | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/pep-0538.txt b/pep-0538.txt index 6376ce16b87..1a6a1fa6d4c 100644 --- a/pep-0538.txt +++ b/pep-0538.txt @@ -292,6 +292,34 @@ locale category, rather than overriding all the locale categories:: ℙƴ☂ℌøἤ +Design Principles +================= + +The above motivation leads to the following core design principles for the +proposed solution: + +* if a locale other than the default C locale is explicitly configured, we'll + continue to respect it +* if we're changing the locale setting without an explicit config option, we'll + emit a warning on stderr that we're doing so rather than silently changing + the process configuration. This will alert application and system integrators + to the change, even if they don't closely follow the PEP process or Python + release announcements. However, to minimize the chance of introducing new + problems for end users, we'll do this *without* using the warnings system, so + even running with ``-Werror`` won't turn it into a runtime exception + +The general design principle of Python 3 to prefer raising an exception over +incorrectly encoding or decoding data then leads to the following additional +design guideline: + +* if a UTF-8 based Linux container is run on a host that is explicitly + configured to use a non-UTF-8 encoding, and tries to exchange locally + encoded data with that host rather than exchanging explicitly UTF-8 encoded + data, this will ideally lead to an immediate runtime exception rather than + to silent data corruption + + + Specification ============= @@ -489,17 +517,25 @@ are valid: default encoding of ASCII the way CPython currently does -Using "strict" error handling by default ----------------------------------------- +Defaulting to "strict" error handling on the standard IO streams +---------------------------------------------------------------- By coercing the locale away from the legacy C default and its assumption of ASCII as the preferred text encoding, this PEP also disables the implicit use of the "surrogateescape" error handler on the standard IO streams that was -introduced in Python 3.5. +introduced in Python 3.5 ([15_]). + +This is deliberate, as that change was primarily aimed at handling the case +where the correct system encoding was the ASCII-compatible UTF-8 (or another +ASCII-compatible encoding), but the nominal encoding used for operating system +interfaces in the current process was ASCII. -This is deliberate, as while UTF-8 as the preferred text encoding is a good -working assumption for network service development and for more recent releases -of client operating systems, it still isn't a universally valid assumption. +With this PEP, that assumption is being narrowed a step further, such that +rather than assuming "an ASCII-compatible encoding", we instead assume UTF-8 +specifically. If that assumption is genuinely wrong, then it continues to be +friendlier to users of other encodings to alert them to the runtime's mistaken +assumption, rather than continuing on and potentially corrupting their data +permanently. In particular, GB 18030 [12_] is a Chinese national text encoding standard that handles all Unicode code points, but is incompatible with both ASCII and @@ -514,6 +550,10 @@ container application that is assuming the use of UTF-8 or vice-versa is likely to cause an immediate Unicode encoding or decoding error, rather than potentially causing silent data corruption. +For users that would prefer more permissive behaviour, setting +``PYTHONIOENCODING=:surrogateescape`` will continue to be supported, as this +PEP makes no changes to that feature. + Dropping official support for Unicode handling in the legacy C locale --------------------------------------------------------------------- @@ -722,6 +762,9 @@ References .. [14] ISO-2022 (https://en.wikipedia.org/wiki/ISO/IEC_2022) +.. [15] Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale + (https://bugs.python.org/issue19977) + Copyright ========= From a525758390400b11d336dcac9d98b4cdeb9c82b6 Mon Sep 17 00:00:00 2001 From: Thomas Samson Date: Sun, 8 Jan 2017 17:08:01 +0100 Subject: [PATCH 17/36] PEP 540: correcting english errors --- pep-0540.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index 904237de25a..82fd024a8aa 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -60,12 +60,12 @@ These users expect that Python 3 "just works" with any locale and don't bother them with encodings. From their point of the view, the bug is not their locale but is obviously Python 3. -Since Python 2 handles data as bytes, it's more rare in Python 2 +Since Python 2 handles data as bytes, it's rarer in Python 2 compared to Python 3 to get Unicode errors. It also explains why users also perceive Python 3 as the root cause of their Unicode errors. Some users expect that Python 3 just works with any locale and so don't -bother of mojibake, whereas some developers are working hard to prevent +bother with mojibake, whereas some developers are working hard to prevent mojibake and so expect that Python 3 fails early before creating mojibake. @@ -185,7 +185,7 @@ On Mac OS X, Windows and Android, Python always use UTF-8 for operating system data. For Windows, see the PEP 529: "Change Windows filesystem encoding to UTF-8". -On Linux, UTF-8 became the defacto standard encoding, +On Linux, UTF-8 became the de facto standard encoding, replacing legacy encodings like ISO 8859-1 or ShiftJIS. For example, using different encodings for filenames and standard streams is likely to create mojibake, so UTF-8 is now used *everywhere*. @@ -208,7 +208,7 @@ information on the UTF-8 codec. Old data stored in different encodings and surrogateescape ---------------------------------------------------------- -Even if UTF-8 became the defacto standard, there are still systems in +Even if UTF-8 became the de facto standard, there are still systems in the wild which don't use UTF-8. And there are a lot of data stored in different encodings. For example, an old USB key using the ext3 filesystem with filenames encoded to ISO 8859-1. @@ -241,7 +241,7 @@ the ASCII encoding. The problem is that operating system data like filenames are decoded using the ``surrogateescape`` error handler (PEP 383). Displaying a -filename to stdout raises an Unicode encode error if the filename +filename to stdout raises a Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the @@ -661,7 +661,7 @@ Always use UTF-8 ---------------- Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. -Since UTF-8 became the defacto encoding, it makes sense to always use it on all +Since UTF-8 became the de facto encoding, it makes sense to always use it on all platforms with any locale. The risk is to introduce mojibake if the locale uses a different encoding, From b54f01e13fd43e4d31803b19eeba859da6364212 Mon Sep 17 00:00:00 2001 From: Mariatta Date: Mon, 9 Jan 2017 22:52:57 -0800 Subject: [PATCH 18/36] reSTify 10 more PEPs (#175) PEP 221 PEP 229 PEP 240 PEP 277 PEP 286 PEP 295 PEP 297 PEP 306 PEP 341 PEP 666 --- pep-0221.txt | 177 +++++++++++++++++++++++++------------------------ pep-0229.txt | 153 ++++++++++++++++++++++-------------------- pep-0240.txt | 114 ++++++++++++++++++-------------- pep-0277.txt | 167 ++++++++++++++++++++++++---------------------- pep-0286.txt | 152 +++++++++++++++++++++++------------------- pep-0295.txt | 183 +++++++++++++++++++++++++++------------------------ pep-0297.txt | 139 ++++++++++++++++++++------------------ pep-0306.txt | 120 +++++++++++++++++---------------- pep-0341.txt | 154 +++++++++++++++++++++++-------------------- pep-0666.txt | 166 ++++++++++++++++++++++++---------------------- 10 files changed, 813 insertions(+), 712 deletions(-) diff --git a/pep-0221.txt b/pep-0221.txt index 92389a2090f..c2608e58265 100644 --- a/pep-0221.txt +++ b/pep-0221.txt @@ -5,113 +5,120 @@ Last-Modified: $Date$ Author: thomas@python.org (Thomas Wouters) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 15-Aug-2000 Python-Version: 2.0 Post-History: Introduction +============ - This PEP describes the `import as' proposal for Python 2.0. This - PEP tracks the status and ownership of this feature. It contains - a description of the feature and outlines changes necessary to - support the feature. The CVS revision history of this file - contains the definitive historical record. +This PEP describes the ``import as`` proposal for Python 2.0. This +PEP tracks the status and ownership of this feature. It contains +a description of the feature and outlines changes necessary to +support the feature. The CVS revision history of this file +contains the definitive historical record. Rationale +========= - This PEP proposes an extension of Python syntax regarding the - `import' and `from import' statements. These statements - load in a module, and either bind that module to a local name, or - binds objects from that module to a local name. However, it is - sometimes desirable to bind those objects to a different name, for - instance to avoid name clashes. This can currently be achieved - using the following idiom: - - import os - real_os = os - del os - - And similarly for the `from ... import' statement: - - from os import fdopen, exit, stat - os_fdopen = fdopen - os_stat = stat - del fdopen, stat - - The proposed syntax change would add an optional `as' clause to - both these statements, as follows: - - import os as real_os - from os import fdopen as os_fdopen, exit, stat as os_stat - - The `as' name is not intended to be a keyword, and some trickery - has to be used to convince the CPython parser it isn't one. For - more advanced parsers/tokenizers, however, this should not be a - problem. - - A slightly special case exists for importing sub-modules. The - statement - - import os.path - - stores the module `os' locally as `os', so that the imported - submodule `path' is accessible as `os.path'. As a result, - - import os.path as p - - stores `os.path', not `os', in `p'. This makes it effectively the - same as - - from os import path as p +This PEP proposes an extension of Python syntax regarding the +``import`` and ``from import`` statements. These statements +load in a module, and either bind that module to a local name, or +binds objects from that module to a local name. However, it is +sometimes desirable to bind those objects to a different name, for +instance to avoid name clashes. This can currently be achieved +using the following idiom:: + import os + real_os = os + del os -Implementation details +And similarly for the ``from ... import`` statement:: + + from os import fdopen, exit, stat + os_fdopen = fdopen + os_stat = stat + del fdopen, stat + +The proposed syntax change would add an optional ``as`` clause to +both these statements, as follows:: + + import os as real_os + from os import fdopen as os_fdopen, exit, stat as os_stat + +The ``as`` name is not intended to be a keyword, and some trickery +has to be used to convince the CPython parser it isn't one. For +more advanced parsers/tokenizers, however, this should not be a +problem. + +A slightly special case exists for importing sub-modules. The +statement:: + + import os.path - This PEP has been accepted, and the suggested code change has been - checked in. The patch can still be found in the SourceForge patch - manager[1]. Currently, a NAME field is used in the grammar rather - than a bare string, to avoid the keyword issue. It introduces a - new bytecode, IMPORT_STAR, which performs the `from module import - *' behaviour, and changes the behaviour of the IMPORT_FROM - bytecode so that it loads the requested name (which is always a - single name) onto the stack, to be subsequently stored by a STORE - opcode. As a result, all names explicitly imported now follow the - `global' directives. - - The special case of `from module import *' remains a special case, - in that it cannot accommodate an `as' clause, and that no STORE - opcodes are generated; the objects imported are loaded directly - into the local namespace. This also means that names imported in - this fashion are always local, and do not follow the `global' - directive. - - An additional change to this syntax has also been suggested, to - generalize the expression given after the `as' clause. Rather - than a single name, it could be allowed to be any expression that - yields a valid l-value; anything that can be assigned to. The - change to accommodate this is minimal, as the patch[2] proves, and - the resulting generalization allows a number of new constructs - that run completely parallel with other Python assignment - constructs. However, this idea has been rejected by Guido, as - `hypergeneralization'. +stores the module ``os`` locally as ``os``, so that the imported +submodule ``path`` is accessible as ``os.path``. As a result:: + + import os.path as p + +stores ``os.path``, not ``os``, in ``p``. This makes it effectively the +same as:: + + from os import path as p + + +Implementation details +====================== + +This PEP has been accepted, and the suggested code change has been +checked in. The patch can still be found in the SourceForge patch +manager [1]_. Currently, a ``NAME`` field is used in the grammar rather +than a bare string, to avoid the keyword issue. It introduces a +new bytecode, ``IMPORT_STAR``, which performs the ``from module import +*`` behaviour, and changes the behaviour of the ``IMPORT_FROM`` +bytecode so that it loads the requested name (which is always a +single name) onto the stack, to be subsequently stored by a ``STORE`` +opcode. As a result, all names explicitly imported now follow the +``global`` directives. + +The special case of ``from module import *`` remains a special case, +in that it cannot accommodate an ``as`` clause, and that no ``STORE`` +opcodes are generated; the objects imported are loaded directly +into the local namespace. This also means that names imported in +this fashion are always local, and do not follow the ``global`` +directive. + +An additional change to this syntax has also been suggested, to +generalize the expression given after the ``as`` clause. Rather +than a single name, it could be allowed to be any expression that +yields a valid l-value; anything that can be assigned to. The +change to accommodate this is minimal, as the patch [2]_ proves, and +the resulting generalization allows a number of new constructs +that run completely parallel with other Python assignment +constructs. However, this idea has been rejected by Guido, as +"hypergeneralization". Copyright +========= - This document has been placed in the Public Domain. +This document has been placed in the Public Domain. References +========== + +.. [1] https://hg.python.org/cpython/rev/18385172fac0 - [1] http://sourceforge.net/patch/?func=detailpatch&patch_id=101135&group_id=5470 +.. [2] http://sourceforge.net/patch/?func=detailpatch&patch_id=101234&group_id=5470 - [2] http://sourceforge.net/patch/?func=detailpatch&patch_id=101234&group_id=5470 - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0229.txt b/pep-0229.txt index 88d1383dc95..8d2eed39945 100644 --- a/pep-0229.txt +++ b/pep-0229.txt @@ -5,116 +5,123 @@ Last-Modified: $Date$ Author: A.M. Kuchling Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 16-Nov-2000 Post-History: Introduction +============ - The Modules/Setup mechanism has some flaws: +The Modules/Setup mechanism has some flaws: - * People have to remember to uncomment bits of Modules/Setup in - order to get all the possible modules. +* People have to remember to uncomment bits of Modules/Setup in + order to get all the possible modules. - * Moving Setup to a new version of Python is tedious; new modules - have been added, so you can't just copy the older version, but - have to reconcile the two versions. +* Moving Setup to a new version of Python is tedious; new modules + have been added, so you can't just copy the older version, but + have to reconcile the two versions. - * Users have to figure out where the needed libraries, such as - zlib, are installed. +* Users have to figure out where the needed libraries, such as + ``zlib``, are installed. Proposal +======== - Use the Distutils to build the modules that come with Python. +Use the Distutils to build the modules that come with Python. - The changes can be broken up into several pieces: +The changes can be broken up into several pieces: - 1. The Distutils needs some Python modules to be able to build - modules. Currently I believe the minimal list is posix, _sre, - and string. +1. The Distutils needs some Python modules to be able to build + modules. Currently I believe the minimal list is posix, _sre, + and string. - These modules will have to be built before the Distutils can be - used, so they'll simply be hardwired into Modules/Makefile and - be automatically built. + These modules will have to be built before the Distutils can be + used, so they'll simply be hardwired into Modules/Makefile and + be automatically built. - 2. A top-level setup.py script will be written that checks the - libraries installed on the system and compiles as many modules - as possible. +2. A top-level setup.py script will be written that checks the + libraries installed on the system and compiles as many modules + as possible. - 3. Modules/Setup will be kept and settings in it will override - setup.py's usual behavior, so you can disable a module known - to be buggy, or specify particular compilation or linker flags. - However, in the common case where setup.py works correctly, - everything in Setup will remain commented out. The other - Setup.* become unnecessary, since nothing will be generating - Setup automatically. +3. Modules/Setup will be kept and settings in it will override + setup.py's usual behavior, so you can disable a module known + to be buggy, or specify particular compilation or linker flags. + However, in the common case where setup.py works correctly, + everything in Setup will remain commented out. The other + Setup.* become unnecessary, since nothing will be generating + Setup automatically. - The patch was checked in for Python 2.1, and has been subsequently - modified. +The patch was checked in for Python 2.1, and has been subsequently +modified. Implementation +============== - Patch #102588 on SourceForge contains the proposed patch. - Currently the patch tries to be conservative and to change as few - files as possible, in order to simplify backing out the patch. - For example, no attempt is made to rip out the existing build - mechanisms. Such simplifications can wait for later in the beta - cycle, when we're certain the patch will be left in, or they can - wait for Python 2.2. - - The patch makes the following changes: +Patch #102588 on SourceForge contains the proposed patch. +Currently the patch tries to be conservative and to change as few +files as possible, in order to simplify backing out the patch. +For example, no attempt is made to rip out the existing build +mechanisms. Such simplifications can wait for later in the beta +cycle, when we're certain the patch will be left in, or they can +wait for Python 2.2. - * Makes some required changes to distutils/sysconfig (these will - be checked in separately) +The patch makes the following changes: - * In the top-level Makefile.in, the "sharedmods" target simply - runs "./python setup.py build", and "sharedinstall" runs - "./python setup.py install". The "clobber" target also deletes - the build/ subdirectory where Distutils puts its output. +* Makes some required changes to distutils/sysconfig (these will + be checked in separately) - * Modules/Setup.config.in only contains entries for the gc and thread - modules; the readline, curses, and db modules are removed because - it's now setup.py's job to handle them. +* In the top-level Makefile.in, the "sharedmods" target simply + runs "./python setup.py build", and "sharedinstall" runs + "./python setup.py install". The "clobber" target also deletes + the build/ subdirectory where Distutils puts its output. - * Modules/Setup.dist now contains entries for only 3 modules -- - _sre, posix, and strop. +* Modules/Setup.config.in only contains entries for the gc and thread + modules; the readline, curses, and db modules are removed because + it's now setup.py's job to handle them. - * The configure script builds setup.cfg from setup.cfg.in. This - is needed for two reasons: to make building in subdirectories - work, and to get the configured installation prefix. +* Modules/Setup.dist now contains entries for only 3 modules -- + _sre, posix, and strop. - * Adds setup.py to the top directory of the source tree. setup.py - is the largest piece of the puzzle, though not the most - complicated. setup.py contains a subclass of the BuildExt - class, and extends it with a detect_modules() method that does - the work of figuring out when modules can be compiled, and adding - them to the 'exts' list. +* The configure script builds setup.cfg from setup.cfg.in. This + is needed for two reasons: to make building in subdirectories + work, and to get the configured installation prefix. + +* Adds setup.py to the top directory of the source tree. setup.py + is the largest piece of the puzzle, though not the most + complicated. setup.py contains a subclass of the BuildExt + class, and extends it with a detect_modules() method that does + the work of figuring out when modules can be compiled, and adding + them to the 'exts' list. Unresolved Issues - - Do we need to make it possible to disable the 3 hard-wired modules - without manually hacking the Makefiles? [Answer: No.] +================= + +Do we need to make it possible to disable the 3 hard-wired modules +without manually hacking the Makefiles? [Answer: No.] - The Distutils always compile modules as shared libraries. How do - we support compiling them statically into the resulting Python - binary? +The Distutils always compile modules as shared libraries. How do +we support compiling them statically into the resulting Python +binary? - [Answer: building a Python binary with the Distutils should be - feasible, though no one has implemented it yet. This should be - done someday, but isn't a pressing priority as messing around with - the top-level Makefile.pre.in is good enough.] +[Answer: building a Python binary with the Distutils should be +feasible, though no one has implemented it yet. This should be +done someday, but isn't a pressing priority as messing around with +the top-level Makefile.pre.in is good enough.] Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0240.txt b/pep-0240.txt index 5b8fdcd47ea..3faa6b5e428 100644 --- a/pep-0240.txt +++ b/pep-0240.txt @@ -2,94 +2,108 @@ PEP: 240 Title: Adding a Rational Literal to Python Version: $Revision$ Last-Modified: $Date$ -Author: Christopher A. Craig , - Moshe Zadka +Author: Christopher A. Craig , Moshe Zadka Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 11-Mar-2001 Python-Version: 2.2 Post-History: 16-Mar-2001 Abstract +======== + +A different PEP [1]_ suggests adding a builtin rational type to +Python. This PEP suggests changing the ddd.ddd float literal to a +rational in Python, and modifying non-integer division to return +it. - A different PEP[1] suggests adding a builtin rational type to - Python. This PEP suggests changing the ddd.ddd float literal to a - rational in Python, and modifying non-integer division to return - it. BDFL Pronouncement +================== + +This PEP is rejected. The needs outlined in the rationale section +have been addressed to some extent by the acceptance of PEP 327 +for decimal arithmetic. Guido also noted, "Rational arithmetic +was the default 'exact' arithmetic in ABC and it did not work out as +expected". See the python-dev discussion on 17 June 2005 [2]_. - This PEP is rejected. The needs outlined in the rationale section - have been addressed to some extent by the acceptance of PEP 327 - for decimal arithmetic. Guido also noted, "Rational arithmetic - was the default 'exact' arithmetic in ABC and it did not work out as - expected". See the python-dev discussion on 17 June 2005. Rationale +========= - Rational numbers are useful for exact and unsurprising arithmetic. - They give the correct results people have been taught in various - math classes. Making the "obvious" non-integer type one with more - predictable semantics will surprise new programmers less than - using floating point numbers. As quite a few posts on c.l.py and - on tutor@python.org have shown, people often get bit by strange - semantics of floating point numbers: for example, round(0.98, 2) - still gives 0.97999999999999998. +Rational numbers are useful for exact and unsurprising arithmetic. +They give the correct results people have been taught in various +math classes. Making the "obvious" non-integer type one with more +predictable semantics will surprise new programmers less than +using floating point numbers. As quite a few posts on c.l.py and +on tutor@python.org have shown, people often get bit by strange +semantics of floating point numbers: for example, round(0.98, 2) +still gives 0.97999999999999998. Proposal +======== - Literals conforming to the regular expression '\d*.\d*' will be - rational numbers. +Literals conforming to the regular expression '\d*.\d*' will be +rational numbers. Backwards Compatibility +======================= - The only backwards compatible issue is the type of literals - mentioned above. The following migration is suggested: +The only backwards compatible issue is the type of literals +mentioned above. The following migration is suggested: - 1. The next Python after approval will allow - "from __future__ import rational_literals" - to cause all such literals to be treated as rational numbers. +1. The next Python after approval will allow + ``from __future__ import rational_literals`` + to cause all such literals to be treated as rational numbers. - 2. Python 3.0 will have a warning, turned on by default, about - such literals in the absence of a __future__ statement. The - warning message will contain information about the __future__ - statement, and indicate that to get floating point literals, - they should be suffixed with "e0". +2. Python 3.0 will have a warning, turned on by default, about + such literals in the absence of a `` __future__`` statement. The + warning message will contain information about the ``__future__`` + statement, and indicate that to get floating point literals, + they should be suffixed with "e0". - 3. Python 3.1 will have the warning turned off by default. This - warning will stay in place for 24 months, at which time the - literals will be rationals and the warning will be removed. +3. Python 3.1 will have the warning turned off by default. This + warning will stay in place for 24 months, at which time the + literals will be rationals and the warning will be removed. Common Objections +================= - Rationals are slow and memory intensive! - (Relax, I'm not taking floats away, I'm just adding two more characters. - 1e0 will still be a float) +Rationals are slow and memory intensive! +(Relax, I'm not taking floats away, I'm just adding two more characters. +1e0 will still be a float) + +Rationals must present themselves as a decimal float or they will be +horrible for users expecting decimals (i.e. ``str(.5)`` should return '.5' and +not '1/2'). This means that many rationals must be truncated at some +point, which gives us a new loss of precision. - Rationals must present themselves as a decimal float or they will be - horrible for users expecting decimals (i.e. str(.5) should return '.5' and - not '1/2'). This means that many rationals must be truncated at some - point, which gives us a new loss of precision. - References +========== - [1] PEP 239, Adding a Rational Type to Python, Zadka, - http://www.python.org/dev/peps/pep-0239/ +.. [1] PEP 239, Adding a Rational Type to Python, Zadka, + http://www.python.org/dev/peps/pep-0239/ +.. [2] Raymond Hettinger, Propose rejection of PEPs 239 and 240 -- a builtin + rational type and rational literals + https://mail.python.org/pipermail/python-dev/2005-June/054281.html Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0277.txt b/pep-0277.txt index 047ba629e59..6bea69995ce 100644 --- a/pep-0277.txt +++ b/pep-0277.txt @@ -5,114 +5,121 @@ Last-Modified: $Date$ Author: neilh@scintilla.org (Neil Hodgson) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 11-Jan-2002 Python-Version: 2.3 Post-History: Abstract +======== - This PEP discusses supporting access to all files possible on - Windows NT by passing Unicode file names directly to the system's - wide-character functions. +This PEP discusses supporting access to all files possible on +Windows NT by passing Unicode file names directly to the system's +wide-character functions. Rationale - - Python 2.2 on Win32 platforms converts Unicode file names passed - to open and to functions in the os module into the 'mbcs' encoding - before passing the result to the operating system. This is often - successful in the common case where the script is operating with - the locale set to the same value as when the file was created. - Most machines are set up as one locale and rarely if ever changed - from this locale. For some users, locale is changed more often - and on servers there are often files saved by users using - different locales. - - On Windows NT and descendent operating systems, including Windows - 2000 and Windows XP, wide-character APIs are available that - provide direct access to all file names, including those that are - not representable using the current locale. The purpose of this - proposal is to provide access to these wide-character APIs through - the standard Python file object and posix module and so provide - access to all files on Windows NT. +========= + +Python 2.2 on Win32 platforms converts Unicode file names passed +to open and to functions in the os module into the 'mbcs' encoding +before passing the result to the operating system. This is often +successful in the common case where the script is operating with +the locale set to the same value as when the file was created. +Most machines are set up as one locale and rarely if ever changed +from this locale. For some users, locale is changed more often +and on servers there are often files saved by users using +different locales. + +On Windows NT and descendent operating systems, including Windows +2000 and Windows XP, wide-character APIs are available that +provide direct access to all file names, including those that are +not representable using the current locale. The purpose of this +proposal is to provide access to these wide-character APIs through +the standard Python file object and posix module and so provide +access to all files on Windows NT. Specification - - On Windows platforms which provide wide-character file APIs, when - Unicode arguments are provided to file APIs, wide-character calls - are made instead of the standard C library and posix calls. - - The Python file object is extended to use a Unicode file name - argument directly rather than converting it. This affects the - file object constructor file(filename[, mode[, bufsize]]) and also - the open function which is an alias of this constructor. When a - Unicode filename argument is used here then the name attribute of - the file object will be Unicode. The representation of a file - object, repr(f) will display Unicode file names as an escaped - string in a similar manner to the representation of Unicode - strings. - - The posix module contains functions that take file or directory - names: chdir, listdir, mkdir, open, remove, rename, rmdir, stat, - and _getfullpathname. These will use Unicode arguments directly - rather than converting them. For the rename function, this - behaviour is triggered when either of the arguments is Unicode and - the other argument converted to Unicode using the default - encoding. - - The listdir function currently returns a list of strings. Under - this proposal, it will return a list of Unicode strings when its - path argument is Unicode. +============= + +On Windows platforms which provide wide-character file APIs, when +Unicode arguments are provided to file APIs, wide-character calls +are made instead of the standard C library and posix calls. + +The Python file object is extended to use a Unicode file name +argument directly rather than converting it. This affects the +file object constructor ``file(filename[, mode[, bufsize]])`` and also +the open function which is an alias of this constructor. When a +Unicode filename argument is used here then the name attribute of +the file object will be Unicode. The representation of a file +object, ``repr(f)`` will display Unicode file names as an escaped +string in a similar manner to the representation of Unicode +strings. + +The posix module contains functions that take file or directory +names: ``chdir``, ``listdir``, ``mkdir``, ``open``, ``remove``, ``rename``, +``rmdir``, ``stat``, and ``_getfullpathname``. These will use Unicode +arguments directly rather than converting them. For the rename function, this +behaviour is triggered when either of the arguments is Unicode and +the other argument converted to Unicode using the default +encoding. + +The ``listdir`` function currently returns a list of strings. Under +this proposal, it will return a list of Unicode strings when its +path argument is Unicode. Restrictions - - On the consumer Windows operating systems, Windows 95, Windows 98, - and Windows ME, there are no wide-character file APIs so behaviour - is unchanged under this proposal. It may be possible in the - future to extend this proposal to cover these operating systems as - the VFAT-32 file system used by them does support Unicode file - names but access is difficult and so implementing this would - require much work. The "Microsoft Layer for Unicode" could be a - starting point for implementing this. - - Python can be compiled with the size of Unicode characters set to - 4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte - type and Py_UNICODE_SIZE to be 4. As the Windows API does not - accept 4 byte characters, the features described in this proposal - will not work in this mode so the implementation falls back to the - current 'mbcs' encoding technique. This restriction could be lifted - in the future by performing extra conversions using - PyUnicode_AsWideChar but for now that would add too much - complexity for a very rarely used feature. +============ + +On the consumer Windows operating systems, Windows 95, Windows 98, +and Windows ME, there are no wide-character file APIs so behaviour +is unchanged under this proposal. It may be possible in the +future to extend this proposal to cover these operating systems as +the VFAT-32 file system used by them does support Unicode file +names but access is difficult and so implementing this would +require much work. The "Microsoft Layer for Unicode" could be a +starting point for implementing this. + +Python can be compiled with the size of Unicode characters set to +4 bytes rather than 2 by defining ``PY_UNICODE_TYPE`` to be a 4 byte +type and ``Py_UNICODE_SIZE`` to be 4. As the Windows API does not +accept 4 byte characters, the features described in this proposal +will not work in this mode so the implementation falls back to the +current 'mbcs' encoding technique. This restriction could be lifted +in the future by performing extra conversions using +``PyUnicode_AsWideChar`` but for now that would add too much +complexity for a very rarely used feature. Reference Implementation +======================== - An experimental implementation is available from - [2] http://scintilla.sourceforge.net/winunichanges.zip - - [3] An updated version is available at - http://python.org/sf/594001 +The implementation is available at [2]_. References +========== + +.. [1] Microsoft Windows APIs + http://msdn.microsoft.com/ - [1] Microsoft Windows APIs - http://msdn.microsoft.com/ +.. [2] http://python.org/sf/594001 Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: + diff --git a/pep-0286.txt b/pep-0286.txt index 1637eb9ee96..85c3d8918c5 100644 --- a/pep-0286.txt +++ b/pep-0286.txt @@ -5,110 +5,130 @@ Last-Modified: $Date$ Author: martin@v.loewis.de (Martin von Löwis) Status: Deferred Type: Standards Track +Content-Type: text/x-rst Created: 3-Mar-2002 Python-Version: 2.3 -Post-History: +Post-History: Abstract +======== + +``PyArg_ParseTuple`` is confronted with difficult memory management if +an argument converter creates new memory. To deal with these +cases, a specialized argument type is proposed. - PyArg_ParseTuple is confronted with difficult memory management if - an argument converter creates new memory. To deal with these - cases, a specialized argument type is proposed. PEP Deferral +============ + +Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the +PEP and collecting and incorporating feedback, and with sufficient +available time to do so effectively. - Further exploration of the concepts covered in this PEP has been deferred - for lack of a current champion interested in promoting the goals of the - PEP and collecting and incorporating feedback, and with sufficient - available time to do so effectively. +The resolution of this PEP may also be affected by the resolution of +PEP 426, which proposes the use of a preprocessing step to generate +some aspects of C API interface code. - The resolution of this PEP may also be affected by the resolution of - PEP 426, which proposes the use of a preprocessing step to generate - some aspects of C API interface code. Problem description +=================== - Today, argument tuples keep references to the function arguments, - which are guaranteed to live as long as the argument tuple exists - which is at least as long as the function call is being executed. +Today, argument tuples keep references to the function arguments, +which are guaranteed to live as long as the argument tuple exists +which is at least as long as the function call is being executed. - In some cases, parsing an argument will allocate new memory, which - is then to be released by the caller. This has two problems: +In some cases, parsing an argument will allocate new memory, which +is then to be released by the caller. This has two problems: - 1. In case of failure, the application cannot know what memory to - release; most callers don't even know that they have the - responsibility to release that memory. Example for this are - the N converter (bug #416288) and the es# converter (bug - #501716). +1. In case of failure, the application cannot know what memory to + release; most callers don't even know that they have the + responsibility to release that memory. Example for this are + the N converter (bug #416288 [1]_) and the es# converter (bug + #501716 [2]_). - 2. Even for successful argument parsing, it is still inconvenient - for the caller to be responsible for releasing the memory. In - some cases, this is unnecessarily inefficient. For example, - the es converter copies the conversion result into memory, even - though there already is a string object that has the right - contents. +2. Even for successful argument parsing, it is still inconvenient + for the caller to be responsible for releasing the memory. In + some cases, this is unnecessarily inefficient. For example, + the es converter copies the conversion result into memory, even + though there already is a string object that has the right + contents. Proposed solution +================= - A new type 'argument tuple' is introduced. This type derives from - tuple, adding an __dict__ member (at tp_dictoffset -4). Instances - of this type might get the following attributes: +A new type 'argument tuple' is introduced. This type derives from +tuple, adding an ``__dict__`` member (at ``tp_dictoffset`` -4). Instances +of this type might get the following attributes: - - 'failobjects', a list of objects which need to be deallocated - in case of success +- 'failobjects', a list of objects which need to be deallocated + in case of success - - 'okobjects', a list of object which will be released when the - argument tuple is released +- 'okobjects', a list of object which will be released when the + argument tuple is released - To manage this type, the following functions will be added, and - used appropriately in ceval.c and getargs.c: +To manage this type, the following functions will be added, and +used appropriately in ``ceval.c`` and ``getargs.c``: - - PyArgTuple_New(int); - - PyArgTuple_AddFailObject(PyObject*, PyObject*); - - PyArgTuple_AddFailMemory(PyObject*, void*); - - PyArgTuple_AddOkObject(PyObject*, PyObject*); - - PyArgTuple_AddOkMemory(PyObject*, void*); - - PyArgTuple_ClearFailed(PyObject*); +- ``PyArgTuple_New(int);`` +- ``PyArgTuple_AddFailObject(PyObject*, PyObject*);`` +- ``PyArgTuple_AddFailMemory(PyObject*, void*);`` +- ``PyArgTuple_AddOkObject(PyObject*, PyObject*);`` +- ``PyArgTuple_AddOkMemory(PyObject*, void*);`` +- ``PyArgTuple_ClearFailed(PyObject*);`` - When argument parsing fails, all fail objects will be released - through Py_DECREF, and all fail memory will be released through - PyMem_Free. If parsing succeeds, the references to the fail - objects and fail memory are dropped, without releasing anything. +When argument parsing fails, all fail objects will be released +through ``Py_DECREF``, and all fail memory will be released through +``PyMem_Free``. If parsing succeeds, the references to the fail +objects and fail memory are dropped, without releasing anything. - When the argument tuple is released, all ok objects and memory - will be released. +When the argument tuple is released, all ok objects and memory +will be released. - If those functions are called with an object of a different type, - a warning is issued and no further action is taken; usage of the - affected converters without using argument tuples is deprecated. +If those functions are called with an object of a different type, +a warning is issued and no further action is taken; usage of the +affected converters without using argument tuples is deprecated. Affected converters +=================== - The following converters will add fail memory and fail objects: N, - es, et, es#, et# (unless memory is passed into the converter) +The following converters will add fail memory and fail objects: N, +es, et, es#, et# (unless memory is passed into the converter) New converters +============== + +To simplify Unicode conversion, the ``e*`` converters are duplicated +as ``E*`` converters (Es, Et, Es#, Et#). The usage of the ``E*`` +converters is identical to that of the ``e*`` converters, except that +the application will not need to manage the resulting memory. +This will be implemented through registration of Ok objects with +the argument tuple. The ``e*`` converters are deprecated. + + +References +========== + +.. [1] infrequent memory leak in pyexpat + (http://bugs.python.org/issue416288) - To simplify Unicode conversion, the e* converters are duplicated - as E* converters (Es, Et, Es#, Et#). The usage of the E* - converters is identical to that of the e* converters, except that - the application will not need to manage the resulting memory. - This will be implemented through registration of Ok objects with - the argument tuple. The e* converters are deprecated. +.. [2] "es#" parser marker leaks memory + (http://bugs.python.org/issue501716) Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: diff --git a/pep-0295.txt b/pep-0295.txt index 34cf7ab1267..5396f239349 100644 --- a/pep-0295.txt +++ b/pep-0295.txt @@ -5,118 +5,127 @@ Last-Modified: $Date$ Author: yozh@mx1.ru (Stepan Koltsov) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 22-Jul-2002 Python-Version: 3.0 Post-History: + Abstract +======== - This PEP describes an interpretation of multiline string constants - for Python. It suggests stripping spaces after newlines and - stripping a newline if it is first character after an opening - quotation. +This PEP describes an interpretation of multiline string constants +for Python. It suggests stripping spaces after newlines and +stripping a newline if it is first character after an opening +quotation. Rationale +========= + +This PEP proposes an interpretation of multiline string constants +in Python. Currently, the value of string constant is all the +text between quotations, maybe with escape sequences substituted, +e.g.:: - This PEP proposes an interpretation of multiline string constants - in Python. Currently, the value of string constant is all the - text between quotations, maybe with escape sequences substituted, - e.g.: - - def f(): - """ - la-la-la - limona, banana - """ - - def g(): - return "This is \ - string" - - print repr(f.__doc__) - print repr(g()) - - prints: - - '\n\tla-la-la\n\tlimona, banana\n\t' - 'This is \tstring' - - This PEP suggest two things - - - ignore the first character after opening quotation, if it is - newline - - second: ignore in string constants all spaces and tabs up to - first non-whitespace character, but no more than current - indentation. - - After applying this, previous program will print: - - 'la-la-la\nlimona, banana\n' - 'This is string' - - To get this result, previous programs could be rewritten for - current Python as (note, this gives the same result with new - strings meaning): - - def f(): - """\ + def f(): + """ la-la-la limona, banana """ - - def g(): - "This is \ + + def g(): + return "This is \ string" - - Or stripping can be done with library routines at runtime (as - pydoc does), but this decreases program readability. + + print repr(f.__doc__) + print repr(g()) + +prints:: + + '\n\tla-la-la\n\tlimona, banana\n\t' + 'This is \tstring' + +This PEP suggest two things: + +- ignore the first character after opening quotation, if it is + newline + +- ignore in string constants all spaces and tabs up to + first non-whitespace character, but no more than current + indentation. + +After applying this, previous program will print:: + + 'la-la-la\nlimona, banana\n' + 'This is string' + +To get this result, previous programs could be rewritten for +current Python as (note, this gives the same result with new +strings meaning):: + + def f(): + """\ + la-la-la + limona, banana + """ + + def g(): + "This is \ + string" + +Or stripping can be done with library routines at runtime (as +pydoc does), but this decreases program readability. Implementation +============== + +I'll say nothing about CPython, Jython or Python.NET. - I'll say nothing about CPython, Jython or Python.NET. - - In original Python, there is no info about the current indentation - (in spaces) at compile time, so space and tab stripping should be - done at parse time. Currently no flags can be passed to the - parser in program text (like from __future__ import xxx). I - suggest enabling or disabling of this feature at Python compile - time depending of CPP flag Py_PARSE_MULTILINE_STRINGS. +In original Python, there is no info about the current indentation +(in spaces) at compile time, so space and tab stripping should be +done at parse time. Currently no flags can be passed to the +parser in program text (like ``from __future__ import xxx``). I +suggest enabling or disabling of this feature at Python compile +time depending of CPP flag ``Py_PARSE_MULTILINE_STRINGS``. Alternatives +============ - New interpretation of string constants can be implemented with flags - 'i' and 'o' to string constants, like - - i""" - SELECT * FROM car - WHERE model = 'i525' - """ is in new style, - - o"""SELECT * FROM employee - WHERE birth < 1982 - """ is in old style, and - - """ - SELECT employee.name, car.name, car.price FROM employee, car - WHERE employee.salary * 36 > car.price - """ is in new style after Python-x.y.z and in old style otherwise. - - Also this feature can be disabled if string is raw, i.e. if flag 'r' - specified. +New interpretation of string constants can be implemented with flags +'i' and 'o' to string constants, like:: + + i""" + SELECT * FROM car + WHERE model = 'i525' + """ is in new style, + + o"""SELECT * FROM employee + WHERE birth < 1982 + """ is in old style, and + + """ + SELECT employee.name, car.name, car.price FROM employee, car + WHERE employee.salary * 36 > car.price + """ is in new style after Python-x.y.z and in old style otherwise. + +Also this feature can be disabled if string is raw, i.e. if flag 'r' +specified. Copyright +========= + +This document has been placed in the Public Domain. - This document has been placed in the Public Domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/pep-0297.txt b/pep-0297.txt index 46e8ef84421..76a36b71d6c 100644 --- a/pep-0297.txt +++ b/pep-0297.txt @@ -5,112 +5,125 @@ Last-Modified: $Date$ Author: mal@lemburg.com (Marc-André Lemburg) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 19-Jul-2001 Python-Version: 2.6 -Post-History: +Post-History: + Rejection Notice +================ - This PEP is rejected for failure to generate significant interest. +This PEP is rejected for failure to generate significant interest. Abstract +======== - This PEP proposes strategies to allow the Python standard library - to be upgraded in parts without having to reinstall the complete - distribution or having to wait for a new patch level release. +This PEP proposes strategies to allow the Python standard library +to be upgraded in parts without having to reinstall the complete +distribution or having to wait for a new patch level release. Problem +======= + +Python currently does not allow overriding modules or packages in +the standard library per default. Even though this is possible by +defining a ``PYTHONPATH`` environment variable (the paths defined in +this variable are prepended to the Python standard library path), +there is no standard way of achieving this without changing the +configuration. - Python currently does not allow overriding modules or packages in - the standard library per default. Even though this is possible by - defining a PYTHONPATH environment variable (the paths defined in - this variable are prepended to the Python standard library path), - there is no standard way of achieving this without changing the - configuration. +Since Python's standard library is starting to host packages which +are also available separately, e.g. the distutils, email and PyXML +packages, which can also be installed independently of the Python +distribution, it is desirable to have an option to upgrade these +packages without having to wait for a new patch level release of +the Python interpreter to bring along the changes. - Since Python's standard library is starting to host packages which - are also available separately, e.g. the distutils, email and PyXML - packages, which can also be installed independently of the Python - distribution, it is desirable to have an option to upgrade these - packages without having to wait for a new patch level release of - the Python interpreter to bring along the changes. +On some occasions, it may also be desirable to update modules of +the standard library without going through the whole Python release +cycle, e.g. in order to provide hot-fixes for security problems. - On some occasions, it may also be desirable to update modules of - the standard library without going through the whole Python release - cycle, e.g. in order to provide hot-fixes for security problems. Proposed Solutions +================== - This PEP proposes two different but not necessarily conflicting - solutions: +This PEP proposes two different but not necessarily conflicting +solutions: - 1. Adding a new standard search path to sys.path: - $stdlibpath/system-packages just before the $stdlibpath - entry. This complements the already existing entry for site - add-ons $stdlibpath/site-packages which is appended to the - sys.path at interpreter startup time. +1. Adding a new standard search path to ``sys.path``: + ``$stdlibpath/system-packages`` just before the ``$stdlibpath`` + entry. This complements the already existing entry for site + add-ons ``$stdlibpath/site-packages`` which is appended to the + ``sys.path`` at interpreter startup time. - To make use of this new standard location, distutils will need - to grow support for installing certain packages in - $stdlibpath/system-packages rather than the standard location - for third-party packages $stdlibpath/site-packages. + To make use of this new standard location, distutils will need + to grow support for installing certain packages in + ``$stdlibpath/system-packages`` rather than the standard location + for third-party packages ``$stdlibpath/site-packages``. - 2. Tweaking distutils to install directly into $stdlibpath for the - system upgrades rather than into $stdlibpath/site-packages. +2. Tweaking distutils to install directly into ``$stdlibpath`` for the + system upgrades rather than into ``$stdlibpath/site-packages``. - The first solution has a few advantages over the second: +The first solution has a few advantages over the second: - * upgrades can be easily identified (just look in - $stdlibpath/system-packages) +* upgrades can be easily identified (just look in + ``$stdlibpath/system-packages``) - * upgrades can be de-installed without affecting the rest - of the interpreter installation +* upgrades can be de-installed without affecting the rest + of the interpreter installation - * modules can be virtually removed from packages; this is - due to the way Python imports packages: once it finds the - top-level package directory it stay in this directory for - all subsequent package submodule imports +* modules can be virtually removed from packages; this is + due to the way Python imports packages: once it finds the + top-level package directory it stay in this directory for + all subsequent package submodule imports - * the approach has an overall much cleaner design than the - hackish install on top of an existing installation approach +* the approach has an overall much cleaner design than the + hackish install on top of an existing installation approach - The only advantages of the second approach are that the Python - interpreter does not have to changed and that it works with - older Python versions. +The only advantages of the second approach are that the Python +interpreter does not have to changed and that it works with +older Python versions. - Both solutions require changes to distutils. These changes can - also be implemented by package authors, but it would be better to - define a standard way of switching on the proposed behaviour. +Both solutions require changes to distutils. These changes can +also be implemented by package authors, but it would be better to +define a standard way of switching on the proposed behaviour. Scope +===== - Solution 1: Python 2.6 and up - Solution 2: all Python versions supported by distutils +Solution 1: Python 2.6 and up + +Solution 2: all Python versions supported by distutils Credits +======= - None +None References +========== - None +None Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: diff --git a/pep-0306.txt b/pep-0306.txt index c6c2cf7dd6c..594d73291ed 100644 --- a/pep-0306.txt +++ b/pep-0306.txt @@ -5,101 +5,111 @@ Last-Modified: $Date$ Author: Michael Hudson , Jack Diederich , Nick Coghlan , Benjamin Peterson Status: Withdrawn Type: Informational -Content-Type: text/plain +Content-Type: text/x-rst Created: 29-Jan-2003 Post-History: 30-Jan-2003 Note +==== - This PEP has been moved to the Python dev guide. +This PEP has been moved to the Python dev guide [1]_. Abstract +======== - There's more to changing Python's grammar than editing - Grammar/Grammar and Python/compile.c. This PEP aims to be a - checklist of places that must also be fixed. +There's more to changing Python's grammar than editing +Grammar/Grammar and Python/compile.c. This PEP aims to be a +checklist of places that must also be fixed. - It is probably incomplete. If you see omissions, just add them if - you can -- you are not going to offend the author's sense of - ownership. Otherwise submit a bug or patch and assign it to mwh. +It is probably incomplete. If you see omissions, just add them if +you can -- you are not going to offend the author's sense of +ownership. Otherwise submit a bug or patch and assign it to mwh. - This PEP is not intended to be an instruction manual on Python - grammar hacking, for several reasons. +This PEP is not intended to be an instruction manual on Python +grammar hacking, for several reasons. Rationale +========= - People are getting this wrong all the time; it took well over a - year before someone noticed[1] that adding the floor division - operator (//) broke the parser module. +People are getting this wrong all the time; it took well over a +year before someone noticed [2]_ that adding the floor division +operator (//) broke the parser module. Checklist +========= - __ Grammar/Grammar: OK, you'd probably worked this one out :) +- Grammar/Grammar: OK, you'd probably worked this one out :) - __ Parser/Python.asdl may need changes to match the Grammar. Run - make to regenerate Include/Python-ast.h and - Python/Python-ast.c. +- Parser/Python.asdl may need changes to match the Grammar. Run + make to regenerate Include/Python-ast.h and + Python/Python-ast.c. - __ Python/ast.c will need changes to create the AST objects - involved with the Grammar change. Lib/compiler/ast.py will - need matching changes to the pure-python AST objects. +- Python/ast.c will need changes to create the AST objects + involved with the Grammar change. Lib/compiler/ast.py will + need matching changes to the pure-python AST objects. - __ Parser/pgen needs to be rerun to regenerate Include/graminit.h - and Python/graminit.c. (make should handle this for you.) +- Parser/pgen needs to be rerun to regenerate Include/graminit.h + and Python/graminit.c. (make should handle this for you.) - __ Python/symbtable.c: This handles the symbol collection pass - that happens immediately before the compilation pass. +- Python/symbtable.c: This handles the symbol collection pass + that happens immediately before the compilation pass. - __ Python/compile.c: You will need to create or modify the - compiler_* functions to generate opcodes for your productions. +- Python/compile.c: You will need to create or modify the + ``compiler_*`` functions to generate opcodes for your productions. - __ You may need to regenerate Lib/symbol.py and/or Lib/token.py - and/or Lib/keyword.py. +- You may need to regenerate Lib/symbol.py and/or Lib/token.py + and/or Lib/keyword.py. - __ The parser module. Add some of your new syntax to test_parser, - bang on Modules/parsermodule.c until it passes. +- The parser module. Add some of your new syntax to test_parser, + bang on Modules/parsermodule.c until it passes. - __ Add some usage of your new syntax to test_grammar.py +- Add some usage of your new syntax to test_grammar.py - __ The compiler package. A good test is to compile the standard - library and test suite with the compiler package and then check - it runs. Note that this only needs to be done in Python 2.x. +- The compiler package. A good test is to compile the standard + library and test suite with the compiler package and then check + it runs. Note that this only needs to be done in Python 2.x. - __ If you've gone so far as to change the token structure of - Python, then the Lib/tokenizer.py library module will need to - be changed. +- If you've gone so far as to change the token structure of + Python, then the Lib/tokenizer.py library module will need to + be changed. - __ Certain changes may require tweaks to the library module - pyclbr. +- Certain changes may require tweaks to the library module + ``pyclbr``. - __ Documentation must be written! +- Documentation must be written! - __ After everything's been checked in, you're likely to see a new - change to Python/Python-ast.c. This is because this - (generated) file contains the SVN version of the source from - which it was generated. There's no way to avoid this; you just - have to submit this file separately. +- After everything's been checked in, you're likely to see a new + change to Python/Python-ast.c. This is because this + (generated) file contains the SVN version of the source from + which it was generated. There's no way to avoid this; you just + have to submit this file separately. References +========== - [1] SF Bug #676521, parser module validation failure - http://www.python.org/sf/676521 +.. [1] CPython Developer's Guide: Changing CPython's Grammar + http://cpython-devguide.readthedocs.io/en/latest/grammar.html + +.. [2] SF Bug #676521, parser module validation failure + http://www.python.org/sf/676521 Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/pep-0341.txt b/pep-0341.txt index 45092cb7103..0c3bfd4cbe8 100644 --- a/pep-0341.txt +++ b/pep-0341.txt @@ -5,46 +5,61 @@ Last-Modified: $Date$ Author: Georg Brandl Status: Final Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 04-May-2005 Post-History: Abstract +======== - This PEP proposes a change in the syntax and semantics of try - statements to allow combined try-except-finally blocks. This - means in short that it would be valid to write +This PEP proposes a change in the syntax and semantics of try +statements to allow combined try-except-finally blocks. This +means in short that it would be valid to write:: - try: - - except Exception: - - finally: - + try: + + except Exception: + + finally: + Rationale/Proposal +================== - There are many use cases for the try-except statement and - for the try-finally statement per se; however, often one needs - to catch exceptions and execute some cleanup code afterwards. - It is slightly annoying and not very intelligible that - one has to write +There are many use cases for the try-except statement and +for the try-finally statement per se; however, often one needs +to catch exceptions and execute some cleanup code afterwards. +It is slightly annoying and not very intelligible that +one has to write:: - f = None + f = None + try: try: - try: - f = open(filename) - text = f.read() - except IOError: - print 'An error occurred' - finally: - if f: - f.close() - - So it is proposed that a construction like this - + f = open(filename) + text = f.read() + except IOError: + print 'An error occurred' + finally: + if f: + f.close() + +So it is proposed that a construction like this:: + + try: + + except Ex1: + + + else: + + finally: + + +be exactly the same as the legacy:: + + try: try: except Ex1: @@ -52,72 +67,65 @@ Rationale/Proposal else: - finally: - - - be exactly the same as the legacy + finally: + - try: - try: - - except Ex1: - - - else: - - finally: - - - This is backwards compatible, and every try statement that is - legal today would continue to work. +This is backwards compatible, and every try statement that is +legal today would continue to work. Changes to the grammar +====================== - The grammar for the try statement, which is currently +The grammar for the try statement, which is currently:: - try_stmt: ('try' ':' suite (except_clause ':' suite)+ - ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite) + try_stmt: ('try' ':' suite (except_clause ':' suite)+ + ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite) - would have to become +would have to become:: + + try_stmt: 'try' ':' suite + ( + (except_clause ':' suite)+ + ['else' ':' suite] + ['finally' ':' suite] + | + 'finally' ':' suite + ) - try_stmt: 'try' ':' suite - ( - (except_clause ':' suite)+ - ['else' ':' suite] - ['finally' ':' suite] - | - 'finally' ':' suite - ) Implementation +============== - As the PEP author currently does not have sufficient knowledge - of the CPython implementation, he is unfortunately not able - to deliver one. Thomas Lee has submitted a patch[2]. +As the PEP author currently does not have sufficient knowledge +of the CPython implementation, he is unfortunately not able +to deliver one. Thomas Lee has submitted a patch [2]_. - However, according to Guido, it should be a piece of cake to - implement[1] -- at least for a core hacker. +However, according to Guido, it should be a piece of cake to +implement [1]_ -- at least for a core hacker. - This patch was committed 17 December 2005, SVN revision 41740 [3]. +This patch was committed 17 December 2005, SVN revision 41740 [3]_. References +========== - [1] http://mail.python.org/pipermail/python-dev/2005-May/053319.html - [2] http://python.org/sf/1355913 - [3] http://mail.python.org/pipermail/python-checkins/2005-December/048457.html +.. [1] http://mail.python.org/pipermail/python-dev/2005-May/053319.html +.. [2] http://python.org/sf/1355913 +.. [3] http://mail.python.org/pipermail/python-checkins/2005-December/048457.html Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/pep-0666.txt b/pep-0666.txt index e533f53e768..e1c46909ffa 100644 --- a/pep-0666.txt +++ b/pep-0666.txt @@ -5,100 +5,106 @@ Last-Modified: $Date$ Author: lac@strakt.com (Laura Creighton) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 3-Dec-2001 Python-Version: 2.2 Post-History: 5-Dec-2001 Abstract - - Everybody agrees that mixing tabs and spaces is a bad idea. Some - people want more than this. I propose that we let people define - whatever Python behaviour they want, so it will only run the way - they like it, and will not run the way they don't like it. We - will do this with a command line switch. Programs that aren't - formatted the way the programmer wants things will raise - IndentationError: - - Python -TNone will refuse to run when there are any tabs. - Python -Tn will refuse to run when tabs are not exactly n spaces - Python -TOnly will refuse to run when blocks are indented by anything - other than tabs - - People who mix tabs and spaces, naturally, will find that their - programs do not run. Alas, we haven't found a way to give them an - electric shock as from a cattle prod remotely. (Though if somebody - finds out a way to do this, I will be pleased to add this option to - the PEP.) - - -Rationale - - Python-list@python.org (a.k.a. comp.lang.python) is periodically - awash with discussions about tabs and spaces. This is inevitable, - given that indentation is syntactically significant in Python. - This has never solved anything, and just makes various people - frustrated and angry. Eventually they start saying rude things to - each other which is sad for all of us. And it is also sad that - they are wasting their valuable time which they could spend - creating something with Python. Moreover, for the Python community - as a whole, from a public relations point of view, this is quite - unfortunate. The people who aren't posting about tabs and spaces, - are, (unsurprisingly) invisible, while the people who are posting - make the rest of us look somewhat foolish. - - The problem is that there is no polite way to say 'Stop wasting - your valuable time and mine.' People who are already in the middle - of a flame war are not well disposed to believe that you are acting - out of compassion for them, and quite rightly insist that their own - time is their own to do with as they please. They are stuck like - flies in treacle in this wretched argument, and it is self-evident - that they cannot disengage or they would have already done so. - - But today I had to spend time cleaning my keyboard because the 'n' - key is sticking. So, in addition to feeling compassion for these - people, I am pretty annoyed. I figure if I make this PEP, we can - then ask Guido to quickly reject it, and then when this argument - next starts up again, we can say 'Guido isn't changing things to - suit the tab-haters or the only-tabbers, so this conversation is a - waste of time.' Then everybody can quietly believe that a) they - are correct and b) other people are fools and c) they are - undeniably fortunate to not have to share a lab with idiots, (which - is something the arguers could do _now_, but apparently have - forgotten). - - And python-list can go back to worrying if it is too smug, rather - than whether it is too hostile for newcomers. Possibly somebody - could get around to explaining to me what is the difference between - __getattr__ and __getattribute__ in non-Classic classes in 2.2, a - question I have foolishly posted in the middle of the current tab - thread. I would like to know the answer to that question.[2] - - This proposal, if accepted, will probably mean a heck of a lot of - work for somebody. But since I don't want it accepted, I don't - care. +======== + +Everybody agrees that mixing tabs and spaces is a bad idea. Some +people want more than this. I propose that we let people define +whatever Python behaviour they want, so it will only run the way +they like it, and will not run the way they don't like it. We +will do this with a command line switch. Programs that aren't +formatted the way the programmer wants things will raise +``IndentationError``. + +- ``Python -TNone`` will refuse to run when there are any tabs. +- ``Python -Tn`` will refuse to run when tabs are not exactly n spaces +- ``Python -TOnly`` will refuse to run when blocks are indented by anything + other than tabs + +People who mix tabs and spaces, naturally, will find that their +programs do not run. Alas, we haven't found a way to give them an +electric shock as from a cattle prod remotely. (Though if somebody +finds out a way to do this, I will be pleased to add this option to +the PEP.) + + +Rationale +========= + +Python-list@python.org (a.k.a. comp.lang.python) is periodically +awash with discussions about tabs and spaces. This is inevitable, +given that indentation is syntactically significant in Python. +This has never solved anything, and just makes various people +frustrated and angry. Eventually they start saying rude things to +each other which is sad for all of us. And it is also sad that +they are wasting their valuable time which they could spend +creating something with Python. Moreover, for the Python community +as a whole, from a public relations point of view, this is quite +unfortunate. The people who aren't posting about tabs and spaces, +are, (unsurprisingly) invisible, while the people who are posting +make the rest of us look somewhat foolish. + +The problem is that there is no polite way to say 'Stop wasting +your valuable time and mine.' People who are already in the middle +of a flame war are not well disposed to believe that you are acting +out of compassion for them, and quite rightly insist that their own +time is their own to do with as they please. They are stuck like +flies in treacle in this wretched argument, and it is self-evident +that they cannot disengage or they would have already done so. + +But today I had to spend time cleaning my keyboard because the 'n' +key is sticking. So, in addition to feeling compassion for these +people, I am pretty annoyed. I figure if I make this PEP, we can +then ask Guido to quickly reject it, and then when this argument +next starts up again, we can say 'Guido isn't changing things to +suit the tab-haters or the only-tabbers, so this conversation is a +waste of time.' Then everybody can quietly believe that a) they +are correct and b) other people are fools and c) they are +undeniably fortunate to not have to share a lab with idiots, (which +is something the arguers could do _now_, but apparently have +forgotten). + +And python-list can go back to worrying if it is too smug, rather +than whether it is too hostile for newcomers. Possibly somebody +could get around to explaining to me what is the difference between +``__getattr__`` and ``__getattribute__`` in non-Classic classes in 2.2, a +question I have foolishly posted in the middle of the current tab +thread. I would like to know the answer to that question [2]_. + +This proposal, if accepted, will probably mean a heck of a lot of +work for somebody. But since I don't want it accepted, I don't +care. References +========== - [1] PEP 1, PEP Purpose and Guidelines - http://www.python.org/dev/peps/pep-0001/ +.. [1] PEP 1, PEP Purpose and Guidelines + http://www.python.org/dev/peps/pep-0001/ - [2] Tim Peters already has (private correspondence). My early 2.2 - didn't have a __getattribute__, and __getattr__ was - implemented like __getattribute__ now is. This has been - fixed. The important conclusion is that my Decorator Pattern - is safe and all is right with the world. +.. [2] Tim Peters already has (private correspondence). My early 2.2 + didn't have a ``__getattribute__``, and ``__getattr__`` was + implemented like ``__getattribute__`` now is. This has been + fixed. The important conclusion is that my Decorator Pattern + is safe and all is right with the world. Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: From 08282fe6a52dc81e18eeb7e54435c806f10f3985 Mon Sep 17 00:00:00 2001 From: Mariatta Date: Tue, 10 Jan 2017 11:30:39 -0800 Subject: [PATCH 19/36] Another batch of 10 PEPs converted (#177) --- pep-0215.txt | 197 +++++++++++++++++++++++++++------------------------ pep-0226.txt | 165 ++++++++++++++++++++++-------------------- pep-0239.txt | 168 +++++++++++++++++++++++-------------------- pep-0250.txt | 194 ++++++++++++++++++++++++++------------------------ pep-0259.txt | 153 ++++++++++++++++++++------------------- pep-0264.txt | 172 +++++++++++++++++++++++--------------------- pep-0274.txt | 159 +++++++++++++++++++++-------------------- pep-0313.txt | 146 ++++++++++++++++++++------------------ pep-0336.txt | 158 ++++++++++++++++++++++------------------- pep-3142.txt | 152 ++++++++++++++++++++------------------- 10 files changed, 885 insertions(+), 779 deletions(-) diff --git a/pep-0215.txt b/pep-0215.txt index 6bc2779f6c1..3e577a63c5b 100644 --- a/pep-0215.txt +++ b/pep-0215.txt @@ -5,139 +5,154 @@ Last-Modified: $Date$ Author: ping@zesty.ca (Ka-Ping Yee) Status: Superseded Type: Standards Track +Content-Type: text/x-rst Created: 24-Jul-2000 Python-Version: 2.1 Post-History: Superseded-By: 292 + Abstract +======== - This document proposes a string interpolation feature for Python - to allow easier string formatting. The suggested syntax change - is the introduction of a '$' prefix that triggers the special - interpretation of the '$' character within a string, in a manner - reminiscent to the variable interpolation found in Unix shells, - awk, Perl, or Tcl. +This document proposes a string interpolation feature for Python +to allow easier string formatting. The suggested syntax change +is the introduction of a '$' prefix that triggers the special +interpretation of the '$' character within a string, in a manner +reminiscent to the variable interpolation found in Unix shells, +awk, Perl, or Tcl. Copyright +========= - This document is in the public domain. +This document is in the public domain. Specification +============= - Strings may be preceded with a '$' prefix that comes before the - leading single or double quotation mark (or triplet) and before - any of the other string prefixes ('r' or 'u'). Such a string is - processed for interpolation after the normal interpretation of - backslash-escapes in its contents. The processing occurs just - before the string is pushed onto the value stack, each time the - string is pushed. In short, Python behaves exactly as if '$' - were a unary operator applied to the string. The operation - performed is as follows: +Strings may be preceded with a '$' prefix that comes before the +leading single or double quotation mark (or triplet) and before +any of the other string prefixes ('r' or 'u'). Such a string is +processed for interpolation after the normal interpretation of +backslash-escapes in its contents. The processing occurs just +before the string is pushed onto the value stack, each time the +string is pushed. In short, Python behaves exactly as if '$' +were a unary operator applied to the string. The operation +performed is as follows: - The string is scanned from start to end for the '$' character - (\x24 in 8-bit strings or \u0024 in Unicode strings). If there - are no '$' characters present, the string is returned unchanged. +The string is scanned from start to end for the '$' character +(``\x24`` in 8-bit strings or ``\u0024`` in Unicode strings). If there +are no '$' characters present, the string is returned unchanged. - Any '$' found in the string, followed by one of the two kinds of - expressions described below, is replaced with the value of the - expression as evaluated in the current namespaces. The value is - converted with str() if the containing string is an 8-bit string, - or with unicode() if it is a Unicode string. +Any '$' found in the string, followed by one of the two kinds of +expressions described below, is replaced with the value of the +expression as evaluated in the current namespaces. The value is +converted with ``str()`` if the containing string is an 8-bit string, +or with ``unicode()`` if it is a Unicode string. - 1. A Python identifier optionally followed by any number of - trailers, where a trailer consists of: - - a dot and an identifier, - - an expression enclosed in square brackets, or - - an argument list enclosed in parentheses - (This is exactly the pattern expressed in the Python grammar - by "NAME trailer*", using the definitions in Grammar/Grammar.) +1. A Python identifier optionally followed by any number of + trailers, where a trailer consists of: + - a dot and an identifier, + - an expression enclosed in square brackets, or + - an argument list enclosed in parentheses + (This is exactly the pattern expressed in the Python grammar + by "``NAME`` trailer*", using the definitions in Grammar/Grammar.) - 2. Any complete Python expression enclosed in curly braces. +2. Any complete Python expression enclosed in curly braces. - Two dollar-signs ("$$") are replaced with a single "$". +Two dollar-signs ("$$") are replaced with a single "$". Examples - - Here is an example of an interactive session exhibiting the - expected behaviour of this feature. - - >>> a, b = 5, 6 - >>> print $'a = $a, b = $b' - a = 5, b = 6 - >>> $u'uni${a}ode' - u'uni5ode' - >>> print $'\$a' - 5 - >>> print $r'\$a' - \5 - >>> print $'$$$a.$b' - $5.6 - >>> print $'a + b = ${a + b}' - a + b = 11 - >>> import sys - >>> print $'References to $a: $sys.getrefcount(a)' - References to 5: 15 - >>> print $"sys = $sys, sys = $sys.modules['sys']" - sys = , sys = - >>> print $'BDFL = $sys.copyright.split()[4].upper()' - BDFL = GUIDO +======== + +Here is an example of an interactive session exhibiting the +expected behaviour of this feature:: + + >>> a, b = 5, 6 + >>> print $'a = $a, b = $b' + a = 5, b = 6 + >>> $u'uni${a}ode' + u'uni5ode' + >>> print $'\$a' + 5 + >>> print $r'\$a' + \5 + >>> print $'$$$a.$b' + $5.6 + >>> print $'a + b = ${a + b}' + a + b = 11 + >>> import sys + >>> print $'References to $a: $sys.getrefcount(a)' + References to 5: 15 + >>> print $"sys = $sys, sys = $sys.modules['sys']" + sys = , sys = + >>> print $'BDFL = $sys.copyright.split()[4].upper()' + BDFL = GUIDO Discussion +========== - '$' is chosen as the interpolation character within the - string for the sake of familiarity, since it is already used - for this purpose in many other languages and contexts. +'$' is chosen as the interpolation character within the +string for the sake of familiarity, since it is already used +for this purpose in many other languages and contexts. - It is then natural to choose '$' as a prefix, since it is a - mnemonic for the interpolation character. +It is then natural to choose '$' as a prefix, since it is a +mnemonic for the interpolation character. - Trailers are permitted to give this interpolation mechanism - even more power than the interpolation available in most other - languages, while the expression to be interpolated remains - clearly visible and free of curly braces. +Trailers are permitted to give this interpolation mechanism +even more power than the interpolation available in most other +languages, while the expression to be interpolated remains +clearly visible and free of curly braces. - '$' works like an operator and could be implemented as an - operator, but that prevents the compile-time optimization - and presents security issues. So, it is only allowed as a - string prefix. +'$' works like an operator and could be implemented as an +operator, but that prevents the compile-time optimization +and presents security issues. So, it is only allowed as a +string prefix. Security Issues +=============== - "$" has the power to eval, but only to eval a literal. As - described here (a string prefix rather than an operator), it - introduces no new security issues since the expressions to be - evaluated must be literally present in the code. +"$" has the power to eval, but only to eval a literal. As +described here (a string prefix rather than an operator), it +introduces no new security issues since the expressions to be +evaluated must be literally present in the code. Implementation +============== + +The ``Itpl`` module at [1]_ provides a +prototype of this feature. It uses the tokenize module to find +the end of an expression to be interpolated, then calls ``eval()`` +on the expression each time a value is needed. In the prototype, +the expression is parsed and compiled again each time it is +evaluated. + +As an optimization, interpolated strings could be compiled +directly into the corresponding bytecode; that is:: + + $'a = $a, b = $b' - The Itpl module at http://www.lfw.org/python/Itpl.py provides a - prototype of this feature. It uses the tokenize module to find - the end of an expression to be interpolated, then calls eval() - on the expression each time a value is needed. In the prototype, - the expression is parsed and compiled again each time it is - evaluated. +could be compiled as though it were the expression:: - As an optimization, interpolated strings could be compiled - directly into the corresponding bytecode; that is, + ('a = ' + str(a) + ', b = ' + str(b)) - $'a = $a, b = $b' +so that it only needs to be compiled once. - could be compiled as though it were the expression - ('a = ' + str(a) + ', b = ' + str(b)) +References +========== - so that it only needs to be compiled once. +.. [1] http://www.lfw.org/python/Itpl.py - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0226.txt b/pep-0226.txt index b1477a7e1a4..a1a030c0259 100644 --- a/pep-0226.txt +++ b/pep-0226.txt @@ -5,121 +5,132 @@ Last-Modified: $Date$ Author: Jeremy Hylton Status: Final Type: Informational +Content-Type: text/x-rst Created: 16-Oct-2000 Python-Version: 2.1 Post-History: + Abstract +======== - This document describes the post Python 2.0 development and - release schedule. According to this schedule, Python 2.1 will be - released in April of 2001. The schedule primarily concerns - itself with PEP-size items. Small bug fixes and changes will - occur up until the first beta release. +This document describes the post Python 2.0 development and +release schedule. According to this schedule, Python 2.1 will be +released in April of 2001. The schedule primarily concerns +itself with PEP-size items. Small bug fixes and changes will +occur up until the first beta release. Release Schedule +================ + +Tentative future release dates - Tentative future release dates +[bugfix release dates go here] - [bugfix release dates go here] +Past release dates: - Past release dates: +- 17-Apr-2001: 2.1 final release +- 15-Apr-2001: 2.1 release candidate 2 +- 13-Apr-2001: 2.1 release candidate 1 +- 23-Mar-2001: Python 2.1 beta 2 release +- 02-Mar-2001: First 2.1 beta release +- 02-Feb-2001: Python 2.1 alpha 2 release +- 22-Jan-2001: Python 2.1 alpha 1 release +- 16-Oct-2000: Python 2.0 final release - 17-Apr-2001: 2.1 final release - 15-Apr-2001: 2.1 release candidate 2 - 13-Apr-2001: 2.1 release candidate 1 - 23-Mar-2001: Python 2.1 beta 2 release - 02-Mar-2001: First 2.1 beta release - 02-Feb-2001: Python 2.1 alpha 2 release - 22-Jan-2001: Python 2.1 alpha 1 release - 16-Oct-2000: Python 2.0 final release Open issues for Python 2.0 beta 2 +================================= + +Add a default unit testing framework to the standard library. - Add a default unit testing framework to the standard library. Guidelines for making changes for Python 2.1 +============================================ - The guidelines and schedule will be revised based on discussion in - the python-dev@python.org mailing list. +The guidelines and schedule will be revised based on discussion in +the python-dev@python.org mailing list. - The PEP system was instituted late in the Python 2.0 development - cycle and many changes did not follow the process described in PEP - 1. The development process for 2.1, however, will follow the PEP - process as documented. +The PEP system was instituted late in the Python 2.0 development +cycle and many changes did not follow the process described in PEP 1. +The development process for 2.1, however, will follow the PEP +process as documented. - The first eight weeks following 2.0 final will be the design and - review phase. By the end of this period, any PEP that is proposed - for 2.1 should be ready for review. This means that the PEP is - written and discussion has occurred on the python-dev@python.org - and python-list@python.org mailing lists. +The first eight weeks following 2.0 final will be the design and +review phase. By the end of this period, any PEP that is proposed +for 2.1 should be ready for review. This means that the PEP is +written and discussion has occurred on the python-dev@python.org +and python-list@python.org mailing lists. - The next six weeks will be spent reviewing the PEPs and - implementing and testing the accepted PEPs. When this period - stops, we will end consideration of any incomplete PEPs. Near the - end of this period, there will be a feature freeze where any small - features not worthy of a PEP will not be accepted. +The next six weeks will be spent reviewing the PEPs and +implementing and testing the accepted PEPs. When this period +stops, we will end consideration of any incomplete PEPs. Near the +end of this period, there will be a feature freeze where any small +features not worthy of a PEP will not be accepted. + +Before the final release, we will have six weeks of beta testing +and a release candidate or two. - Before the final release, we will have six weeks of beta testing - and a release candidate or two. General guidelines for submitting patches and making changes +============================================================ + +Use good sense when committing changes. You should know what we +mean by good sense or we wouldn't have given you commit privileges +<0.5 wink>. Some specific examples of good sense include: - Use good sense when committing changes. You should know what we - mean by good sense or we wouldn't have given you commit privileges - <0.5 wink>. Some specific examples of good sense include: +- Do whatever the dictator tells you. - - Do whatever the dictator tells you. +- Discuss any controversial changes on python-dev first. If you + get a lot of +1 votes and no -1 votes, make the change. If you + get a some -1 votes, think twice; consider asking Guido what he + thinks. - - Discuss any controversial changes on python-dev first. If you - get a lot of +1 votes and no -1 votes, make the change. If you - get a some -1 votes, think twice; consider asking Guido what he - thinks. +- If the change is to code you contributed, it probably makes + sense for you to fix it. - - If the change is to code you contributed, it probably makes - sense for you to fix it. +- If the change affects code someone else wrote, it probably makes + sense to ask him or her first. - - If the change affects code someone else wrote, it probably makes - sense to ask him or her first. +- You can use the SourceForge (SF) Patch Manager to submit a patch + and assign it to someone for review. - - You can use the SourceForge (SF) Patch Manager to submit a patch - and assign it to someone for review. +Any significant new feature must be described in a PEP and +approved before it is checked in. - Any significant new feature must be described in a PEP and - approved before it is checked in. +Any significant code addition, such as a new module or large +patch, must include test cases for the regression test and +documentation. A patch should not be checked in until the tests +and documentation are ready. - Any significant code addition, such as a new module or large - patch, must include test cases for the regression test and - documentation. A patch should not be checked in until the tests - and documentation are ready. +If you fix a bug, you should write a test case that would have +caught the bug. - If you fix a bug, you should write a test case that would have - caught the bug. +If you commit a patch from the SF Patch Manager or fix a bug from +the Jitterbug database, be sure to reference the patch/bug number +in the CVS log message. Also be sure to change the status in the +patch manager or bug database (if you have access to the bug +database). - If you commit a patch from the SF Patch Manager or fix a bug from - the Jitterbug database, be sure to reference the patch/bug number - in the CVS log message. Also be sure to change the status in the - patch manager or bug database (if you have access to the bug - database). +It is not acceptable for any checked in code to cause the +regression test to fail. If a checkin causes a failure, it must +be fixed within 24 hours or it will be backed out. - It is not acceptable for any checked in code to cause the - regression test to fail. If a checkin causes a failure, it must - be fixed within 24 hours or it will be backed out. +All contributed C code must be ANSI C. If possible check it with +two different compilers, e.g. gcc and MSVC. - All contributed C code must be ANSI C. If possible check it with - two different compilers, e.g. gcc and MSVC. +All contributed Python code must follow Guido's Python style +guide. http://www.python.org/doc/essays/styleguide.html - All contributed Python code must follow Guido's Python style - guide. http://www.python.org/doc/essays/styleguide.html +It is understood that any code contributed will be released under +an Open Source license. Do not contribute code if it can't be +released this way. - It is understood that any code contributed will be released under - an Open Source license. Do not contribute code if it can't be - released this way. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0239.txt b/pep-0239.txt index 61d01e15f39..3c705e0511b 100644 --- a/pep-0239.txt +++ b/pep-0239.txt @@ -2,133 +2,147 @@ PEP: 239 Title: Adding a Rational Type to Python Version: $Revision$ Last-Modified: $Date$ -Author: Christopher A. Craig , - Moshe Zadka +Author: Christopher A. Craig , Moshe Zadka Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 11-Mar-2001 Python-Version: 2.2 Post-History: 16-Mar-2001 Abstract +======== + +Python has no numeric type with the semantics of an unboundedly +precise rational number. This proposal explains the semantics of +such a type, and suggests builtin functions and literals to +support such a type. This PEP suggests no literals for rational +numbers; that is left for another PEP [1]_. - Python has no numeric type with the semantics of an unboundedly - precise rational number. This proposal explains the semantics of - such a type, and suggests builtin functions and literals to - support such a type. This PEP suggests no literals for rational - numbers; that is left for another PEP[1]. BDFL Pronouncement +================== + +This PEP is rejected. The needs outlined in the rationale section +have been addressed to some extent by the acceptance of PEP 327 +for decimal arithmetic. Guido also noted, "Rational arithmetic +was the default 'exact' arithmetic in ABC and it did not work out as +expected". See the python-dev discussion on 17 June 2005 [2]_. - This PEP is rejected. The needs outlined in the rationale section - have been addressed to some extent by the acceptance of PEP 327 - for decimal arithmetic. Guido also noted, "Rational arithmetic - was the default 'exact' arithmetic in ABC and it did not work out as - expected". See the python-dev discussion on 17 June 2005. +*Postscript:* With the acceptance of PEP 3141, "A Type Hierarchy +for Numbers", a 'Rational' numeric abstract base class was added +with a concrete implementation in the 'fractions' module. - *Postscript:* With the acceptance of PEP 3141, "A Type Hierarchy - for Numbers", a 'Rational' numeric abstract base class was added - with a concrete implementation in the 'fractions' module. Rationale +========= - While sometimes slower and more memory intensive (in general, - unboundedly so) rational arithmetic captures more closely the - mathematical ideal of numbers, and tends to have behavior which is - less surprising to newbies. Though many Python implementations of - rational numbers have been written, none of these exist in the - core, or are documented in any way. This has made them much less - accessible to people who are less Python-savvy. +While sometimes slower and more memory intensive (in general, +unboundedly so) rational arithmetic captures more closely the +mathematical ideal of numbers, and tends to have behavior which is +less surprising to newbies. Though many Python implementations of +rational numbers have been written, none of these exist in the +core, or are documented in any way. This has made them much less +accessible to people who are less Python-savvy. RationalType +============ - There will be a new numeric type added called RationalType. Its - unary operators will do the obvious thing. Binary operators will - coerce integers and long integers to rationals, and rationals to - floats and complexes. +There will be a new numeric type added called ``RationalType``. Its +unary operators will do the obvious thing. Binary operators will +coerce integers and long integers to rationals, and rationals to +floats and complexes. - The following attributes will be supported: .numerator and - .denominator. The language definition will promise that +The following attributes will be supported: ``.numerator`` and +``.denominator``. The language definition will promise that:: - r.denominator * r == r.numerator + r.denominator * r == r.numerator - that the GCD of the numerator and the denominator is 1 and that - the denominator is positive. +that the GCD of the numerator and the denominator is 1 and that +the denominator is positive. - The method r.trim(max_denominator) will return the closest - rational s to r such that abs(s.denominator) <= max_denominator. +The method ``r.trim(max_denominator)`` will return the closest +rational ``s`` to ``r`` such that ``abs(s.denominator) <= max_denominator``. The rational() Builtin +====================== - This function will have the signature rational(n, d=1). n and d - must both be integers, long integers or rationals. A guarantee is - made that +This function will have the signature ``rational(n, d=1)``. ``n`` and ``d`` +must both be integers, long integers or rationals. A guarantee is +made that:: - rational(n, d) * d == n + rational(n, d) * d == n Open Issues +=========== - - Maybe the type should be called rat instead of rational. - Somebody proposed that we have "abstract" pure mathematical - types named complex, real, rational, integer, and "concrete" - representation types with names like float, rat, long, int. +- Maybe the type should be called rat instead of rational. + Somebody proposed that we have "abstract" pure mathematical + types named complex, real, rational, integer, and "concrete" + representation types with names like float, rat, long, int. - - Should a rational number with an integer value be allowed as a - sequence index? For example, should s[5/3 - 2/3] be equivalent - to s[1]? +- Should a rational number with an integer value be allowed as a + sequence index? For example, should ``s[5/3 - 2/3]`` be equivalent + to ``s[1]``? - - Should shift and mask operators be allowed for rational numbers? - For rational numbers with integer values? +- Should ``shift`` and ``mask`` operators be allowed for rational numbers? + For rational numbers with integer values? - - Marcin 'Qrczak' Kowalczyk summarized the arguments for and - against unifying ints with rationals nicely on c.l.py: +- Marcin 'Qrczak' Kowalczyk summarized the arguments for and + against unifying ints with rationals nicely on c.l.py - Arguments for unifying ints with rationals: + Arguments for unifying ints with rationals: - - Since 2 == 2/1 and maybe str(2/1) == '2', it reduces surprises - where objects seem equal but behave differently. + - Since ``2 == 2/1`` and maybe ``str(2/1) == '2'``, it reduces surprises + where objects seem equal but behave differently. - - / can be freely used for integer division when I *know* that - there is no remainder (if I am wrong and there is a remainder, - there will probably be some exception later). + - / can be freely used for integer division when I *know* that + there is no remainder (if I am wrong and there is a remainder, + there will probably be some exception later). - Arguments against: + Arguments against: - - When I use the result of / as a sequence index, it's usually - an error which should not be hidden by making the program - working for some data, since it will break for other data. + - When I use the result of / as a sequence index, it's usually + an error which should not be hidden by making the program + working for some data, since it will break for other data. - - (this assumes that after unification int and rational would be - different types:) Types should rarely depend on values. It's - easier to reason when the type of a variable is known: I know - how I can use it. I can determine that something is an int and - expect that other objects used in this place will be ints too. + - (this assumes that after unification int and rational would be + different types:) Types should rarely depend on values. It's + easier to reason when the type of a variable is known: I know + how I can use it. I can determine that something is an int and + expect that other objects used in this place will be ints too. - - (this assumes the same type for them:) Int is a good type in - itself, not to be mixed with rationals. The fact that - something is an integer should be expressible as a statement - about its type. Many operations require ints and don't accept - rationals. It's natural to think about them as about different - types. + - (this assumes the same type for them:) Int is a good type in + itself, not to be mixed with rationals. The fact that + something is an integer should be expressible as a statement + about its type. Many operations require ints and don't accept + rationals. It's natural to think about them as about different + types. References +========== - [1] PEP 240, Adding a Rational Literal to Python, Zadka, - http://www.python.org/dev/peps/pep-0240/ +.. [1] PEP 240, Adding a Rational Literal to Python, Zadka, + http://www.python.org/dev/peps/pep-0240/ +.. [2] Raymond Hettinger, Propose rejection of PEPs 239 and 240 -- a builtin + rational type and rational literals + https://mail.python.org/pipermail/python-dev/2005-June/054281.html Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0250.txt b/pep-0250.txt index 579fe4a734b..618acb4fe84 100644 --- a/pep-0250.txt +++ b/pep-0250.txt @@ -5,134 +5,142 @@ Last-Modified: $Date$ Author: p.f.moore@gmail.com (Paul Moore) Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 30-Mar-2001 Python-Version: 2.2 Post-History: 30-Mar-2001 Abstract +======== - The standard Python distribution includes a directory - Lib/site-packages, which is used on Unix platforms to hold - locally-installed modules and packages. The site.py module - distributed with Python includes support for locating other - modules in the site-packages directory. +The standard Python distribution includes a directory +``Lib/site-packages``, which is used on Unix platforms to hold +locally-installed modules and packages. The ``site.py`` module +distributed with Python includes support for locating other +modules in the site-packages directory. - This PEP proposes that the site-packages directory should be used - on the Windows platform in a similar manner. +This PEP proposes that the site-packages directory should be used +on the Windows platform in a similar manner. Motivation - - On Windows platforms, the default setting for sys.path does not - include a directory suitable for users to install locally - developed modules. The "expected" location appears to be the - directory containing the Python executable itself. This is also - the location where distutils (and distutils-generated installers) - installs packages. Including locally developed code in the same - directory as installed executables is not good practice. - - Clearly, users can manipulate sys.path, either in a locally - modified site.py, or in a suitable sitecustomize.py, or even via - .pth files. However, there should be a standard location for such - files, rather than relying on every individual site having to set - their own policy. - - In addition, with distutils becoming more prevalent as a means of - distributing modules, the need for a standard install location for - distributed modules will become more common. It would be better - to define such a standard now, rather than later when more - distutils-based packages exist which will need rebuilding. - - It is relevant to note that prior to Python 2.1, the site-packages - directory was not included in sys.path for Macintosh platforms. - This has been changed in 2.1, and Macintosh includes sys.path now, - leaving Windows as the only major platform with no site-specific - modules directory. +========== + +On Windows platforms, the default setting for ``sys.path`` does not +include a directory suitable for users to install locally +developed modules. The "expected" location appears to be the +directory containing the Python executable itself. This is also +the location where distutils (and distutils-generated installers) +installs packages. Including locally developed code in the same +directory as installed executables is not good practice. + +Clearly, users can manipulate ``sys.path``, either in a locally +modified ``site.py``, or in a suitable ``sitecustomize.py``, or even via +``.pth`` files. However, there should be a standard location for such +files, rather than relying on every individual site having to set +their own policy. + +In addition, with distutils becoming more prevalent as a means of +distributing modules, the need for a standard install location for +distributed modules will become more common. It would be better +to define such a standard now, rather than later when more +distutils-based packages exist which will need rebuilding. + +It is relevant to note that prior to Python 2.1, the site-packages +directory was not included in ``sys.path`` for Macintosh platforms. +This has been changed in 2.1, and Macintosh includes ``sys.path`` now, +leaving Windows as the only major platform with no site-specific +modules directory. Implementation +============== - The implementation of this feature is fairly trivial. All that - would be required is a change to site.py, to change the section - setting sitedirs. The Python 2.1 version has +The implementation of this feature is fairly trivial. All that +would be required is a change to ``site.py``, to change the section +setting sitedirs. The Python 2.1 version has:: - if os.sep == '/': - sitedirs = [makepath(prefix, - "lib", - "python" + sys.version[:3], - "site-packages"), - makepath(prefix, "lib", "site-python")] - elif os.sep == ':': - sitedirs = [makepath(prefix, "lib", "site-packages")] - else: - sitedirs = [prefix] + if os.sep == '/': + sitedirs = [makepath(prefix, + "lib", + "python" + sys.version[:3], + "site-packages"), + makepath(prefix, "lib", "site-python")] + elif os.sep == ':': + sitedirs = [makepath(prefix, "lib", "site-packages")] + else: + sitedirs = [prefix] - A suitable change would be to simply replace the last 4 lines with +A suitable change would be to simply replace the last 4 lines with:: - else: - sitedirs == [prefix, makepath(prefix, "lib", "site-packages")] + else: + sitedirs == [prefix, makepath(prefix, "lib", "site-packages")] - Changes would also be required to distutils, to reflect this change - in policy. A patch is available on Sourceforge, patch ID 445744, - which implements this change. Note that the patch checks the Python - version and only invokes the new behaviour for Python versions from - 2.2 onwards. This is to ensure that distutils remains compatible - with earlier versions of Python. +Changes would also be required to distutils, to reflect this change +in policy. A patch is available on Sourceforge, patch ID 445744, +which implements this change. Note that the patch checks the Python +version and only invokes the new behaviour for Python versions from +2.2 onwards. This is to ensure that distutils remains compatible +with earlier versions of Python. - Finally, the executable code which implements the Windows installer - used by the bdist_wininst command will need changing to use the new - location. A separate patch is available for this, currently - maintained by Thomas Heller. +Finally, the executable code which implements the Windows installer +used by the ``bdist_wininst`` command will need changing to use the new +location. A separate patch is available for this, currently +maintained by Thomas Heller. Notes +===== - - This change does not preclude packages using the current - location -- the change only adds a directory to sys.path, it - does not remove anything. +- This change does not preclude packages using the current + location -- the change only adds a directory to ``sys.path``, it + does not remove anything. - - Both the current location (sys.prefix) and the new directory - (site-packages) are included in sitedirs, so that .pth files - will be recognised in either location. +- Both the current location (``sys.prefix``) and the new directory + (site-packages) are included in sitedirs, so that ``.pth`` files + will be recognised in either location. - - This proposal adds a single additional site-packages directory - to sitedirs. On Unix platforms, two directories are added, one - for version-independent files (Python code) and one for - version-dependent code (C extensions). This is necessary on - Unix, as the sitedirs include a common (across Python versions) - package location, in /usr/local by default. As there is no such - common location available on Windows, there is also no need for - having two separate package directories. +- This proposal adds a single additional site-packages directory + to sitedirs. On Unix platforms, two directories are added, one + for version-independent files (Python code) and one for + version-dependent code (C extensions). This is necessary on + Unix, as the sitedirs include a common (across Python versions) + package location, in ``/usr/local`` by default. As there is no such + common location available on Windows, there is also no need for + having two separate package directories. - - If users want to keep DLLs in a single location on Windows, rather - than keeping them in the package directory, the DLLs subdirectory - of the Python install directory is already available for that - purpose. Adding an extra directory solely for DLLs should not be - necessary. +- If users want to keep DLLs in a single location on Windows, rather + than keeping them in the package directory, the DLLs subdirectory + of the Python install directory is already available for that + purpose. Adding an extra directory solely for DLLs should not be + necessary. Open Issues +=========== - - Comments from Unix users indicate that there may be issues with - the current setup on the Unix platform. Rather than become - involved in cross-platform issues, this PEP specifically limits - itself to the Windows platform, leaving changes for other platforms - to be covered inother PEPs. +- Comments from Unix users indicate that there may be issues with + the current setup on the Unix platform. Rather than become + involved in cross-platform issues, this PEP specifically limits + itself to the Windows platform, leaving changes for other platforms + to be covered in other PEPs. - - There could be issues with applications which embed Python. To the - author's knowledge, there should be no problem as a result of this - change. There have been no comments (supportive or otherwise) from - users who embed Python. +- There could be issues with applications which embed Python. To the + author's knowledge, there should be no problem as a result of this + change. There have been no comments (supportive or otherwise) from + users who embed Python. Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0259.txt b/pep-0259.txt index fb9c5635acd..a7c0e6c12bc 100644 --- a/pep-0259.txt +++ b/pep-0259.txt @@ -5,129 +5,140 @@ Last-Modified: $Date$ Author: guido@python.org (Guido van Rossum) Status: Rejected Type: Standards Track +Content-Type: text/x-rst Created: 11-Jun-2001 Python-Version: 2.2 Post-History: 11-Jun-2001 + Abstract +======== - Currently, the print statement always appends a newline, unless a - trailing comma is used. This means that if we want to print data - that already ends in a newline, we get two newlines, unless - special precautions are taken. +Currently, the ``print`` statement always appends a newline, unless a +trailing comma is used. This means that if we want to print data +that already ends in a newline, we get two newlines, unless +special precautions are taken. - I propose to skip printing the newline when it follows a newline - that came from data. +I propose to skip printing the newline when it follows a newline +that came from data. - In order to avoid having to add yet another magic variable to file - objects, I propose to give the existing 'softspace' variable an - extra meaning: a negative value will mean "the last data written - ended in a newline so no space *or* newline is required." +In order to avoid having to add yet another magic variable to file +objects, I propose to give the existing 'softspace' variable an +extra meaning: a negative value will mean "the last data written +ended in a newline so no space *or* newline is required." Problem +======= - When printing data that resembles the lines read from a file using - a simple loop, double-spacing occurs unless special care is taken: +When printing data that resembles the lines read from a file using +a simple loop, double-spacing occurs unless special care is taken:: - >>> for line in open("/etc/passwd").readlines(): - ... print line - ... - root:x:0:0:root:/root:/bin/bash + >>> for line in open("/etc/passwd").readlines(): + ... print line + ... + root:x:0:0:root:/root:/bin/bash - bin:x:1:1:bin:/bin: + bin:x:1:1:bin:/bin: - daemon:x:2:2:daemon:/sbin: + daemon:x:2:2:daemon:/sbin: - (etc.) + (etc.) - >>> + >>> - While there are easy work-arounds, this is often noticed only - during testing and requires an extra edit-test roundtrip; the - fixed code is uglier and harder to maintain. +While there are easy work-arounds, this is often noticed only +during testing and requires an extra edit-test roundtrip; the +fixed code is uglier and harder to maintain. Proposed Solution +================= - In the PRINT_ITEM opcode in ceval.c, when a string object is - printed, a check is already made that looks at the last character - of that string. Currently, if that last character is a whitespace - character other than space, the softspace flag is reset to zero; - this suppresses the space between two items if the first item is a - string ending in newline, tab, etc. (but not when it ends in a - space). Otherwise the softspace flag is set to one. +In the ``PRINT_ITEM`` opcode in ``ceval.c``, when a string object is +printed, a check is already made that looks at the last character +of that string. Currently, if that last character is a whitespace +character other than space, the softspace flag is reset to zero; +this suppresses the space between two items if the first item is a +string ending in newline, tab, etc. (but not when it ends in a +space). Otherwise the softspace flag is set to one. - The proposal changes this test slightly so that softspace is set - to: +The proposal changes this test slightly so that softspace is set +to: - -1 -- if the last object written is a string ending in a - newline +- ``-1`` -- if the last object written is a string ending in a + newline - 0 -- if the last object written is a string ending in a - whitespace character that's neither space nor newline +- ``0`` -- if the last object written is a string ending in a + whitespace character that's neither space nor newline - 1 -- in all other cases (including the case when the last - object written is an empty string or not a string) +- ``1`` -- in all other cases (including the case when the last + object written is an empty string or not a string) - Then, the PRINT_NEWLINE opcode, printing of the newline is - suppressed if the value of softspace is negative; in any case the - softspace flag is reset to zero. +Then, the ``PRINT_NEWLINE`` opcode, printing of the newline is +suppressed if the value of softspace is negative; in any case the +softspace flag is reset to zero. Scope +===== - This only affects printing of 8-bit strings. It doesn't affect - Unicode, although that could be considered a bug in the Unicode - implementation. It doesn't affect other objects whose string - representation happens to end in a newline character. +This only affects printing of 8-bit strings. It doesn't affect +Unicode, although that could be considered a bug in the Unicode +implementation. It doesn't affect other objects whose string +representation happens to end in a newline character. Risks +===== - This change breaks some existing code. For example: +This change breaks some existing code. For example:: - print "Subject: PEP 259\n" - print message_body + print "Subject: PEP 259\n" + print message_body - In current Python, this produces a blank line separating the - subject from the message body; with the proposed change, the body - begins immediately below the subject. This is not very robust - code anyway; it is better written as +In current Python, this produces a blank line separating the +subject from the message body; with the proposed change, the body +begins immediately below the subject. This is not very robust +code anyway; it is better written as:: - print "Subject: PEP 259" - print - print message_body + print "Subject: PEP 259" + print + print message_body - In the test suite, only test_StringIO (which explicitly tests for - this feature) breaks. +In the test suite, only ``test_StringIO`` (which explicitly tests for +this feature) breaks. Implementation +============== - A patch relative to current CVS is here: +A patch relative to current CVS is here:: - http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 + http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470 Rejected +======== - The user community unanimously rejected this, so I won't pursue - this idea any further. Frequently heard arguments against - included: +The user community unanimously rejected this, so I won't pursue +this idea any further. Frequently heard arguments against +included: - - It is likely to break thousands of CGI scripts. +- It is likely to break thousands of CGI scripts. - - Enough magic already (also: no more tinkering with 'print' - please). +- Enough magic already (also: no more tinkering with 'print' + please). Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0264.txt b/pep-0264.txt index d28577f19f4..2f960a5e654 100644 --- a/pep-0264.txt +++ b/pep-0264.txt @@ -5,6 +5,7 @@ Last-Modified: $Date$ Author: Michael Hudson Status: Final Type: Standards Track +Content-Type: text/x-rst Requires: 236 Created: 30-Jul-2001 Python-Version: 2.2 @@ -12,129 +13,140 @@ Post-History: 30-Jul-2001 Abstract +======== - As noted in PEP 236, there is no clear way for "simulated - interactive shells" to simulate the behaviour of __future__ - statements in "real" interactive shells, i.e. have __future__ - statements' effects last the life of the shell. +As noted in PEP 236, there is no clear way for "simulated +interactive shells" to simulate the behaviour of ``__future__`` +statements in "real" interactive shells, i.e. have ``__future__`` +statements' effects last the life of the shell. - The PEP also takes the opportunity to clean up the other - unresolved issue mentioned in PEP 236, the inability to stop - compile() inheriting the effect of future statements affecting the - code calling compile(). +The PEP also takes the opportunity to clean up the other +unresolved issue mentioned in PEP 236, the inability to stop +``compile()`` inheriting the effect of future statements affecting the +code calling ``compile()``. - This PEP proposes to address the first problem by adding an - optional fourth argument to the builtin function "compile", adding - information to the _Feature instances defined in __future__.py and - adding machinery to the standard library modules "codeop" and - "code" to make the construction of such shells easy. +This PEP proposes to address the first problem by adding an +optional fourth argument to the builtin function "compile", adding +information to the ``_Feature`` instances defined in ``__future__.py`` and +adding machinery to the standard library modules "codeop" and +"code" to make the construction of such shells easy. - The second problem is dealt with by simply adding *another* - optional argument to compile(), which if non-zero suppresses the - inheriting of future statements' effects. +The second problem is dealt with by simply adding *another* +optional argument to ``compile()``, which if non-zero suppresses the +inheriting of future statements' effects. Specification +============= - I propose adding a fourth, optional, "flags" argument to the - builtin "compile" function. If this argument is omitted, - there will be no change in behaviour from that of Python 2.1. +I propose adding a fourth, optional, "flags" argument to the +builtin "compile" function. If this argument is omitted, +there will be no change in behaviour from that of Python 2.1. - If it is present it is expected to be an integer, representing - various possible compile time options as a bitfield. The - bitfields will have the same values as the CO_* flags already used - by the C part of Python interpreter to refer to future statements. +If it is present it is expected to be an integer, representing +various possible compile time options as a bitfield. The +bitfields will have the same values as the ``CO_*`` flags already used +by the C part of Python interpreter to refer to future statements. - compile() shall raise a ValueError exception if it does not - recognize any of the bits set in the supplied flags. +``compile()`` shall raise a ``ValueError`` exception if it does not +recognize any of the bits set in the supplied flags. - The flags supplied will be bitwise-"or"ed with the flags that - would be set anyway, unless the new fifth optional argument is a - non-zero intger, in which case the flags supplied will be exactly - the set used. +The flags supplied will be bitwise-"or"ed with the flags that +would be set anyway, unless the new fifth optional argument is a +non-zero intger, in which case the flags supplied will be exactly +the set used. - The above-mentioned flags are not currently exposed to Python. I - propose adding .compiler_flag attributes to the _Feature objects - in __future__.py that contain the necessary bits, so one might - write code such as: +The above-mentioned flags are not currently exposed to Python. I +propose adding ``.compiler_flag`` attributes to the ``_Feature`` objects +in ``__future__.py`` that contain the necessary bits, so one might +write code such as:: - import __future__ - def compile_generator(func_def): - return compile(func_def, "", "suite", - __future__.generators.compiler_flag) + import __future__ + def compile_generator(func_def): + return compile(func_def, "", "suite", + __future__.generators.compiler_flag) - A recent change means that these same bits can be used to tell if - a code object was compiled with a given feature; for instance +A recent change means that these same bits can be used to tell if +a code object was compiled with a given feature; for instance:: - codeob.co_flags & __future__.generators.compiler_flag + codeob.co_flags & __future__.generators.compiler_flag`` - will be non-zero if and only if the code object "codeob" was - compiled in an environment where generators were allowed. +will be non-zero if and only if the code object "codeob" was +compiled in an environment where generators were allowed. - I will also add a .all_feature_flags attribute to the __future__ - module, giving a low-effort way of enumerating all the __future__ - options supported by the running interpreter. +I will also add a ``.all_feature_flags`` attribute to the ``__future__`` +module, giving a low-effort way of enumerating all the ``__future__`` +options supported by the running interpreter. - I also propose adding a pair of classes to the standard library - module codeop. +I also propose adding a pair of classes to the standard library +module codeop. - One - Compile - will sport a __call__ method which will act much - like the builtin "compile" of 2.1 with the difference that after - it has compiled a __future__ statement, it "remembers" it and - compiles all subsequent code with the __future__ option in effect. +One - Compile - will sport a ``__call__`` method which will act much +like the builtin "compile" of 2.1 with the difference that after +it has compiled a ``__future__`` statement, it "remembers" it and +compiles all subsequent code with the ``__future__`` option in effect. - It will do this by using the new features of the __future__ module - mentioned above. +It will do this by using the new features of the ``__future__`` module +mentioned above. - Objects of the other class added to codeop - CommandCompiler - - will do the job of the existing codeop.compile_command function, - but in a __future__-aware way. +Objects of the other class added to codeop - ``CommandCompiler`` - +will do the job of the existing ``codeop.compile_command`` function, +but in a ``__future__``-aware way. - Finally, I propose to modify the class InteractiveInterpreter in - the standard library module code to use a CommandCompiler to - emulate still more closely the behaviour of the default Python - shell. +Finally, I propose to modify the class ``InteractiveInterpreter`` in +the standard library module code to use a ``CommandCompiler`` to +emulate still more closely the behaviour of the default Python +shell. Backward Compatibility +====================== - Should be very few or none; the changes to compile will make no - difference to existing code, nor will adding new functions or - classes to codeop. Existing code using - code.InteractiveInterpreter may change in behaviour, but only for - the better in that the "real" Python shell will be being better - impersonated. +Should be very few or none; the changes to compile will make no +difference to existing code, nor will adding new functions or +classes to codeop. Existing code using +``code.InteractiveInterpreter`` may change in behaviour, but only for +the better in that the "real" Python shell will be being better +impersonated. Forward Compatibility +===================== - The fiddling that needs to be done to Lib/__future__.py when - adding a __future_ feature will be a touch more complicated. - Everything else should just work. +The fiddling that needs to be done to ``Lib/__future__.py`` when +adding a ``__future__`` feature will be a touch more complicated. +Everything else should just work. Issues +====== - I hope the above interface is not too disruptive to implement for - Jython. +I hope the above interface is not too disruptive to implement for +Jython. Implementation +============== - A series of preliminary implementations are at: +A series of preliminary implementations are at [1]_. - http://sourceforge.net/tracker/?func=detail&atid=305470&aid=449043&group_id=5470 +After light massaging by Tim Peters, they have now been checked in. - After light massaging by Tim Peters, they have now been checked in. +References +========== + +.. [1] http://sourceforge.net/tracker/?func=detail&atid=305470&aid=449043&group_id=5470 Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + End: diff --git a/pep-0274.txt b/pep-0274.txt index c98995763dd..1177ad83535 100644 --- a/pep-0274.txt +++ b/pep-0274.txt @@ -5,131 +5,138 @@ Last-Modified: $Date$ Author: Barry Warsaw Status: Final Type: Standards Track +Content-Type: text/x-rst Created: 25-Oct-2001 Python-Version: 2.7, 3.0 (originally 2.3) Post-History: 29-Oct-2001 Abstract +======== - PEP 202 introduces a syntactical extension to Python called the - "list comprehension"[1]. This PEP proposes a similar syntactical - extension called the "dictionary comprehension" or "dict - comprehension" for short. You can use dict comprehensions in ways - very similar to list comprehensions, except that they produce - Python dictionary objects instead of list objects. +PEP 202 introduces a syntactical extension to Python called the +"list comprehension". This PEP proposes a similar syntactical +extension called the "dictionary comprehension" or "dict +comprehension" for short. You can use dict comprehensions in ways +very similar to list comprehensions, except that they produce +Python dictionary objects instead of list objects. Resolution +========== - This PEP was originally written for inclusion in Python 2.3. It - was withdrawn after observation that substantially all of its - benefits were subsumed by generator expressions coupled with the - dict() constructor. +This PEP was originally written for inclusion in Python 2.3. It +was withdrawn after observation that substantially all of its +benefits were subsumed by generator expressions coupled with the +``dict()`` constructor. - However, Python 2.7 and 3.0 introduces this exact feature, as well - as the closely related set comprehensions. On 2012-04-09, the PEP - was changed to reflect this reality by updating its Status to - Accepted, and updating the Python-Version field. The Open - Questions section was also removed since these have been long - resolved by the current implementation. +However, Python 2.7 and 3.0 introduces this exact feature, as well +as the closely related set comprehensions. On 2012-04-09, the PEP +was changed to reflect this reality by updating its Status to +Accepted, and updating the Python-Version field. The Open +Questions section was also removed since these have been long +resolved by the current implementation. Proposed Solution +================= - Dict comprehensions are just like list comprehensions, except that - you group the expression using curly braces instead of square - braces. Also, the left part before the `for' keyword expresses - both a key and a value, separated by a colon. The notation is - specifically designed to remind you of list comprehensions as - applied to dictionaries. +Dict comprehensions are just like list comprehensions, except that +you group the expression using curly braces instead of square +braces. Also, the left part before the ``for`` keyword expresses +both a key and a value, separated by a colon. The notation is +specifically designed to remind you of list comprehensions as +applied to dictionaries. Rationale +========= - There are times when you have some data arranged as a sequences of - length-2 sequences, and you want to turn that into a dictionary. - In Python 2.2, the dict() constructor accepts an argument that is - a sequence of length-2 sequences, used as (key, value) pairs to - initialize a new dictionary object. +There are times when you have some data arranged as a sequences of +length-2 sequences, and you want to turn that into a dictionary. +In Python 2.2, the ``dict()`` constructor accepts an argument that is +a sequence of length-2 sequences, used as (key, value) pairs to +initialize a new dictionary object. - However, the act of turning some data into a sequence of length-2 - sequences can be inconvenient or inefficient from a memory or - performance standpoint. Also, for some common operations, such as - turning a list of things into a set of things for quick duplicate - removal or set inclusion tests, a better syntax can help code - clarity. +However, the act of turning some data into a sequence of length-2 +sequences can be inconvenient or inefficient from a memory or +performance standpoint. Also, for some common operations, such as +turning a list of things into a set of things for quick duplicate +removal or set inclusion tests, a better syntax can help code +clarity. - As with list comprehensions, an explicit for loop can always be - used (and in fact was the only way to do it in earlier versions of - Python). But as with list comprehensions, dict comprehensions can - provide a more syntactically succinct idiom that the traditional - for loop. +As with list comprehensions, an explicit for loop can always be +used (and in fact was the only way to do it in earlier versions of +Python). But as with list comprehensions, dict comprehensions can +provide a more syntactically succinct idiom that the traditional +for loop. Semantics +========= - The semantics of dict comprehensions can actually be demonstrated - in stock Python 2.2, by passing a list comprehension to the - built-in dictionary constructor: +The semantics of dict comprehensions can actually be demonstrated +in stock Python 2.2, by passing a list comprehension to the +built-in dictionary constructor:: >>> dict([(i, chr(65+i)) for i in range(4)]) - is semantically equivalent to +is semantically equivalent to:: >>> {i : chr(65+i) for i in range(4)} - The dictionary constructor approach has two distinct disadvantages - from the proposed syntax though. First, it isn't as legible as a - dict comprehension. Second, it forces the programmer to create an - in-core list object first, which could be expensive. +The dictionary constructor approach has two distinct disadvantages +from the proposed syntax though. First, it isn't as legible as a +dict comprehension. Second, it forces the programmer to create an +in-core list object first, which could be expensive. Examples +======== - >>> print {i : chr(65+i) for i in range(4)} - {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'} - >>> print {k : v for k, v in someDict.iteritems()} == someDict.copy() - 1 - >>> print {x.lower() : 1 for x in list_of_email_addrs} - {'barry@zope.com' : 1, 'barry@python.org' : 1, 'guido@python.org' : 1} - >>> def invert(d): - ... return {v : k for k, v in d.iteritems()} - ... - >>> d = {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'} - >>> print invert(d) - {'A' : 0, 'B' : 1, 'C' : 2, 'D' : 3} +>>> print {i : chr(65+i) for i in range(4)} +{0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'} - >>> {(k, v): k+v for k in range(4) for v in range(4)} - ... {(3, 3): 6, (3, 2): 5, (3, 1): 4, (0, 1): 1, (2, 1): 3, - (0, 2): 2, (3, 0): 3, (0, 3): 3, (1, 1): 2, (1, 0): 1, - (0, 0): 0, (1, 2): 3, (2, 0): 2, (1, 3): 4, (2, 2): 4, ( - 2, 3): 5} +>>> print {k : v for k, v in someDict.iteritems()} == someDict.copy() +1 +>>> print {x.lower() : 1 for x in list_of_email_addrs} +{'barry@zope.com' : 1, 'barry@python.org' : 1, 'guido@python.org' : 1} -Implementation +>>> def invert(d): +... return {v : k for k, v in d.iteritems()} +... +>>> d = {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'} +>>> print invert(d) +{'A' : 0, 'B' : 1, 'C' : 2, 'D' : 3} - All implementation details were resolved in the Python 2.7 and 3.0 - time-frame. +>>> {(k, v): k+v for k in range(4) for v in range(4)} +... {(3, 3): 6, (3, 2): 5, (3, 1): 4, (0, 1): 1, (2, 1): 3, + (0, 2): 2, (3, 0): 3, (0, 3): 3, (1, 1): 2, (1, 0): 1, + (0, 0): 0, (1, 2): 3, (2, 0): 2, (1, 3): 4, (2, 2): 4, ( + 2, 3): 5} -References +Implementation +============== - [1] PEP 202, List Comprehensions - http://www.python.org/dev/peps/pep-0202/ +All implementation details were resolved in the Python 2.7 and 3.0 +time-frame. Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + fill-column: 70 + End: diff --git a/pep-0313.txt b/pep-0313.txt index bc7114021f9..20b9ea46870 100644 --- a/pep-0313.txt +++ b/pep-0313.txt @@ -5,124 +5,134 @@ Last-Modified: $Date$ Author: Mike Meyer Status: Rejected Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 01-Apr-2003 Python-Version: 2.4 Post-History: Abstract +======== + +This PEP (also known as PEP CCCXIII) proposes adding Roman +numerals as a literal type. It also proposes the new built-in +function "roman", which converts an object to an integer, then +converts the integer to a string that is the Roman numeral literal +equivalent to the integer. - This PEP (also known as PEP CCCXIII) proposes adding Roman - numerals as a literal type. It also proposes the new built-in - function "roman", which converts an object to an integer, then - converts the integer to a string that is the Roman numeral literal - equivalent to the integer. BDFL Pronouncement +================== - This PEP is rejected. While the majority of Python users deemed this - to be a nice-to-have feature, the community was unable to reach a - consensus on whether nine should be represented as IX, the modern - form, or VIIII, the classic form. Likewise, no agreement was - reached on whether MXM or MCMXC would be considered a well-formed - representation of 1990. A vocal minority of users has also requested - support for lower-cased numerals for use in (i) powerpoint slides, - (ii) academic work, and (iii) Perl documentation. +This PEP is rejected. While the majority of Python users deemed this +to be a nice-to-have feature, the community was unable to reach a +consensus on whether nine should be represented as IX, the modern +form, or VIIII, the classic form. Likewise, no agreement was +reached on whether MXM or MCMXC would be considered a well-formed +representation of 1990. A vocal minority of users has also requested +support for lower-cased numerals for use in (i) powerpoint slides, +(ii) academic work, and (iii) Perl documentation. Rationale +========= - Roman numerals are used in a number of areas, and adding them to - Python as literals would make computations in those areas easier. - For instance, Super Bowls are counted with Roman numerals, and many - older movies have copyright dates in Roman numerals. Further, - LISP provides a Roman numerals literal package, so adding Roman - numerals to Python will help ease the LISP-envy sometimes seen in - comp.lang.python. Besides, the author thinks this is the easiest - way to get his name on a PEP. +Roman numerals are used in a number of areas, and adding them to +Python as literals would make computations in those areas easier. +For instance, Super Bowls are counted with Roman numerals, and many +older movies have copyright dates in Roman numerals. Further, +LISP provides a Roman numerals literal package, so adding Roman +numerals to Python will help ease the LISP-envy sometimes seen in +comp.lang.python. Besides, the author thinks this is the easiest +way to get his name on a PEP. Syntax for Roman literals +========================= - Roman numeral literals will consist of the characters M, D, C, L, - X, V and I, and only those characters. They must be in upper - case, and represent an integer with the following rules: +Roman numeral literals will consist of the characters M, D, C, L, +X, V and I, and only those characters. They must be in upper +case, and represent an integer with the following rules: - 1. Except as noted below, they must appear in the order M, D, C, +1. Except as noted below, they must appear in the order M, D, C, L, X, V then I. Each occurrence of each character adds 1000, 500, 100, 50, 10, 5 and 1 to the value of the literal, respectively. - 2. Only one D, V or L may appear in any given literal. +2. Only one D, V or L may appear in any given literal. - 3. At most three each of Is, Xs and Cs may appear consecutively +3. At most three each of Is, Xs and Cs may appear consecutively in any given literal. - 4. A single I may appear immediately to the left of the single V, +4. A single I may appear immediately to the left of the single V, followed by no Is, and adds 4 to the value of the literal. - 5. A single I may likewise appear before the last X, followed by +5. A single I may likewise appear before the last X, followed by no Is or Vs, and adds 9 to the value. - 6. X is to L and C as I is to V and X, except the values are 40 +6. X is to L and C as I is to V and X, except the values are 40 and 90, respectively. - 7. C is to D and M as I is to V and X, except the values are 400 +7. C is to D and M as I is to V and X, except the values are 400 and 900, respectively. - Any literal composed entirely of M, D, C, L, X, V and I characters - that does not follow this format will raise a syntax error, - because explicit is better than implicit. +Any literal composed entirely of M, D, C, L, X, V and I characters +that does not follow this format will raise a syntax error, +because explicit is better than implicit. Built-In "roman" Function +========================= - The new built-in function "roman" will aide the translation from - integers to Roman numeral literals. It will accept a single - object as an argument, and return a string containing the literal - of the same value. If the argument is not an integer or a - rational (see PEP 239 [1]) it will passed through the existing - built-in "int" to obtain the value. This may cause a loss of - information if the object was a float. If the object is a - rational, then the result will be formatted as a rational literal - (see PEP 240 [2]) with the integers in the string being Roman - numeral literals. +The new built-in function "roman" will aide the translation from +integers to Roman numeral literals. It will accept a single +object as an argument, and return a string containing the literal +of the same value. If the argument is not an integer or a +rational (see PEP 239 [1]_) it will passed through the existing +built-in "int" to obtain the value. This may cause a loss of +information if the object was a float. If the object is a +rational, then the result will be formatted as a rational literal +(see PEP 240 [2]_) with the integers in the string being Roman +numeral literals. Compatibility Issues +==================== - No new keywords are introduced by this proposal. Programs that - use variable names that are all upper case and contain only the - characters M, D, C, L, X, V and I will be affected by the new - literals. These programs will now have syntax errors when those - variables are assigned, and either syntax errors or subtle bugs - when those variables are referenced in expressions. Since such - variable names violate PEP 8 [3], the code is already broken, it - just wasn't generating exceptions. This proposal corrects that - oversight in the language. +No new keywords are introduced by this proposal. Programs that +use variable names that are all upper case and contain only the +characters M, D, C, L, X, V and I will be affected by the new +literals. These programs will now have syntax errors when those +variables are assigned, and either syntax errors or subtle bugs +when those variables are referenced in expressions. Since such +variable names violate PEP 8 [3]_, the code is already broken, it +just wasn't generating exceptions. This proposal corrects that +oversight in the language. References +========== - [1] PEP 239, Adding a Rational Type to Python - http://www.python.org/dev/peps/pep-0239/ +.. [1] PEP 239, Adding a Rational Type to Python + http://www.python.org/dev/peps/pep-0239/ - [2] PEP 240, Adding a Rational Literal to Python - http://www.python.org/dev/peps/pep-0240/ +.. [2] PEP 240, Adding a Rational Literal to Python + http://www.python.org/dev/peps/pep-0240/ - [3] PEP 8, Style Guide for Python Code - http://www.python.org/dev/peps/pep-0008/ +.. [3] PEP 8, Style Guide for Python Code + http://www.python.org/dev/peps/pep-0008/ Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/pep-0336.txt b/pep-0336.txt index 03047c51a33..072f9874756 100644 --- a/pep-0336.txt +++ b/pep-0336.txt @@ -1,130 +1,142 @@ PEP: 336 Title: Make None Callable Version: $Revision$ -Last-Modified: $Date$ +Last-Modified: $Date$ Author: Andrew McClelland Status: Rejected Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 28-Oct-2004 -Post-History: +Post-History: Abstract +======== + +``None`` should be a callable object that when called with any +arguments has no side effect and returns ``None``. - None should be a callable object that when called with any - arguments has no side effect and returns None. BDFL Pronouncement +================== - This PEP is rejected. It is considered a feature that None raises - an error when called. The proposal falls short in tests for - obviousness, clarity, explictness, and necessity. The provided Switch - example is nice but easily handled by a simple lambda definition. - See python-dev discussion on 17 June 2005. +This PEP is rejected. It is considered a feature that ``None`` raises +an error when called. The proposal falls short in tests for +obviousness, clarity, explictness, and necessity. The provided Switch +example is nice but easily handled by a simple lambda definition. +See python-dev discussion on 17 June 2005 [2]_. Motivation +========== - To allow a programming style for selectable actions that is more - in accordance with the minimalistic functional programming goals - of the Python language. +To allow a programming style for selectable actions that is more +in accordance with the minimalistic functional programming goals +of the Python language. Rationale +========= - Allow the use of None in method tables as a universal no effect - rather than either (1) checking a method table entry against None - before calling, or (2) writing a local no effect method with - arguments similar to other functions in the table. +Allow the use of ``None`` in method tables as a universal no effect +rather than either (1) checking a method table entry against ``None`` +before calling, or (2) writing a local no effect method with +arguments similar to other functions in the table. - The semantics would be effectively, +The semantics would be effectively:: - class None: + class None: - def __call__(self, *args): - pass + def __call__(self, *args): + pass How To Use +========== - Before, checking function table entry against None: +Before, checking function table entry against ``None``:: - class Select: + class Select: - def a(self, input): - print 'a' + def a(self, input): + print 'a' - def b(self, input): - print 'b' + def b(self, input): + print 'b' - def c(self, input); - print 'c' + def c(self, input); + print 'c' - def __call__(self, input): - function = { 1 : self.a, - 2 : self.b, - 3 : self.c - }.get(input, None) - if function: return function(input) + def __call__(self, input): + function = { 1 : self.a, + 2 : self.b, + 3 : self.c + }.get(input, None) + if function: return function(input) - Before, using a local no effect method: +Before, using a local no effect method:: - class Select: + class Select: - def a(self, input): - print 'a' + def a(self, input): + print 'a' - def b(self, input): - print 'b' + def b(self, input): + print 'b' - def c(self, input); - print 'c' + def c(self, input); + print 'c' - def nop(self, input): - pass + def nop(self, input): + pass - def __call__(self, input): - return { 1 : self.a, - 2 : self.b, - 3 : self.c - }.get(input, self.nop)(input) + def __call__(self, input): + return { 1 : self.a, + 2 : self.b, + 3 : self.c + }.get(input, self.nop)(input) - After: +After:: - class Select: + class Select: - def a(self, input): - print 'a' + def a(self, input): + print 'a' - def b(self, input): - print 'b' + def b(self, input): + print 'b' - def c(self, input); - print 'c' + def c(self, input); + print 'c' - def __call__(self, input): - return { 1 : self.a, - 2 : self.b, - 3 : self.c - }.get(input, None)(input) + def __call__(self, input): + return { 1 : self.a, + 2 : self.b, + 3 : self.c + }.get(input, None)(input) References +========== + +.. [1] Python Reference Manual, Section 3.2, + http://docs.python.org/reference/ - [1] Python Reference Manual, Section 3.2, - http://docs.python.org/reference/ +.. [2] Raymond Hettinger, Propose to reject PEP 336 -- Make None Callable + https://mail.python.org/pipermail/python-dev/2005-June/054280.html Copyright +========= + +This document has been placed in the public domain. - This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -End: +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: diff --git a/pep-3142.txt b/pep-3142.txt index d85c55f30dc..67c9bb4795d 100644 --- a/pep-3142.txt +++ b/pep-3142.txt @@ -5,7 +5,7 @@ Last-Modified: $Date$ Author: Gerald Britton Status: Rejected Type: Standards Track -Content-Type: text/plain +Content-Type: text/x-rst Created: 12-Jan-2009 Python-Version: 3.0 Post-History: @@ -13,116 +13,122 @@ Resolution: http://mail.python.org/pipermail/python-dev/2013-May/126136.html Abstract +======== - This PEP proposes an enhancement to generator expressions, adding a - "while" clause to complement the existing "if" clause. +This PEP proposes an enhancement to generator expressions, adding a +"while" clause to complement the existing "if" clause. Rationale +========= - A generator expression (PEP 289 [1]) is a concise method to serve - dynamically-generated objects to list comprehensions (PEP 202 [2]). - Current generator expressions allow for an "if" clause to filter - the objects that are returned to those meeting some set of - criteria. However, since the "if" clause is evaluated for every - object that may be returned, in some cases it is possible that all - objects would be rejected after a certain point. For example: +A generator expression (PEP 289 [1]_) is a concise method to serve +dynamically-generated objects to list comprehensions (PEP 202 [2]_). +Current generator expressions allow for an "if" clause to filter +the objects that are returned to those meeting some set of +criteria. However, since the "if" clause is evaluated for every +object that may be returned, in some cases it is possible that all +objects would be rejected after a certain point. For example:: - g = (n for n in range(100) if n*n < 50) + g = (n for n in range(100) if n*n < 50) - which is equivalent to the using a generator function - (PEP 255 [3]): +which is equivalent to the using a generator function +(PEP 255 [3]_):: - def __gen(exp): - for n in exp: - if n*n < 50: - yield n - g = __gen(iter(range(10))) + def __gen(exp): + for n in exp: + if n*n < 50: + yield n + g = __gen(iter(range(10))) - would yield 0, 1, 2, 3, 4, 5, 6 and 7, but would also consider - the numbers from 8 to 99 and reject them all since n*n >= 50 for - numbers in that range. Allowing for a "while" clause would allow - the redundant tests to be short-circuited: +would yield 0, 1, 2, 3, 4, 5, 6 and 7, but would also consider +the numbers from 8 to 99 and reject them all since ``n*n >= 50`` for +numbers in that range. Allowing for a "while" clause would allow +the redundant tests to be short-circuited:: - g = (n for n in range(100) while n*n < 50) + g = (n for n in range(100) while n*n < 50) - would also yield 0, 1, 2, 3, 4, 5, 6 and 7, but would stop at 8 - since the condition (n*n < 50) is no longer true. This would be - equivalent to the generator function: +would also yield 0, 1, 2, 3, 4, 5, 6 and 7, but would stop at 8 +since the condition (``n*n < 50``) is no longer true. This would be +equivalent to the generator function:: - def __gen(exp): - for n in exp: - if n*n < 50: - yield n - else: - break - g = __gen(iter(range(100))) + def __gen(exp): + for n in exp: + if n*n < 50: + yield n + else: + break + g = __gen(iter(range(100))) - Currently, in order to achieve the same result, one would need to - either write a generator function such as the one above or use the - takewhile function from itertools: +Currently, in order to achieve the same result, one would need to +either write a generator function such as the one above or use the +takewhile function from itertools:: - from itertools import takewhile - g = takewhile(lambda n: n*n < 50, range(100)) + from itertools import takewhile + g = takewhile(lambda n: n*n < 50, range(100)) - The takewhile code achieves the same result as the proposed syntax, - albeit in a longer (some would say "less-elegant") fashion. Also, - the takewhile version requires an extra function call (the lambda - in the example above) with the associated performance penalty. - A simple test shows that: +The takewhile code achieves the same result as the proposed syntax, +albeit in a longer (some would say "less-elegant") fashion. Also, +the takewhile version requires an extra function call (the lambda +in the example above) with the associated performance penalty. +A simple test shows that:: - for n in (n for n in range(100) if 1): pass + for n in (n for n in range(100) if 1): pass - performs about 10% better than: +performs about 10% better than:: - for n in takewhile(lambda n: 1, range(100)): pass + for n in takewhile(lambda n: 1, range(100)): pass - though they achieve similar results. (The first example uses a - generator; takewhile is an iterator). If similarly implemented, - a "while" clause should perform about the same as the "if" clause - does today. +though they achieve similar results. (The first example uses a +generator; takewhile is an iterator). If similarly implemented, +a "while" clause should perform about the same as the "if" clause +does today. - The reader may ask if the "if" and "while" clauses should be - mutually exclusive. There are good examples that show that there - are times when both may be used to good advantage. For example: +The reader may ask if the "if" and "while" clauses should be +mutually exclusive. There are good examples that show that there +are times when both may be used to good advantage. For example:: - p = (p for p in primes() if p > 100 while p < 1000) + p = (p for p in primes() if p > 100 while p < 1000) - should return prime numbers found between 100 and 1000, assuming - I have a primes() generator that yields prime numbers. +should return prime numbers found between 100 and 1000, assuming +I have a ``primes()`` generator that yields prime numbers. - Adding a "while" clause to generator expressions maintains the - compact form while adding a useful facility for short-circuiting - the expression. +Adding a "while" clause to generator expressions maintains the +compact form while adding a useful facility for short-circuiting +the expression. Acknowledgements +================ - Raymond Hettinger first proposed the concept of generator - expressions in January 2002. +Raymond Hettinger first proposed the concept of generator +expressions in January 2002. References +========== - [1] PEP 289: Generator Expressions +.. [1] PEP 289: Generator Expressions http://www.python.org/dev/peps/pep-0289/ - [2] PEP 202: List Comprehensions +.. [2] PEP 202: List Comprehensions http://www.python.org/dev/peps/pep-0202/ - [3] PEP 255: Simple Generators +.. [3] PEP 255: Simple Generators http://www.python.org/dev/peps/pep-0255/ Copyright +========= - This document has been placed in the public domain. +This document has been placed in the public domain. - -Local Variables: -mode: indented-text -indent-tabs-mode: nil -sentence-end-double-space: t -fill-column: 70 -coding: utf-8 -End: + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + coding: utf-8 + End: From 7bdfb0006d0264f7d624d35261ec77b79388aedb Mon Sep 17 00:00:00 2001 From: Nick Nystrom Date: Tue, 10 Jan 2017 16:18:56 -0600 Subject: [PATCH 20/36] Spelling (gneric > generic) (#178) --- pep-0484.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0484.txt b/pep-0484.txt index b2cd6e1478e..a5d55c65f07 100644 --- a/pep-0484.txt +++ b/pep-0484.txt @@ -1025,7 +1025,7 @@ allow all operations on it, and a value of type ``Any`` can be assigned to a variable (or used as a return value) of a more constrained type. A function parameter without an annotation is assumed to be annotated with -``Any``. If a gneric type is used without specifying type parameters, +``Any``. If a generic type is used without specifying type parameters, they assumed to be ``Any``:: from typing import Mapping From d3f6e2e9fd3f2aad3a0c88d0d4bffa062649095e Mon Sep 17 00:00:00 2001 From: Brett Cannon Date: Tue, 10 Jan 2017 16:42:08 -0800 Subject: [PATCH 21/36] Update TODO list (#179) --- pep-0512.txt | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/pep-0512.txt b/pep-0512.txt index 7734d16b5ec..96812caee70 100644 --- a/pep-0512.txt +++ b/pep-0512.txt @@ -714,10 +714,6 @@ Required: * In progress - - `Linking a pull request to an issue`_ - (http://psf.upfronthosting.co.za/roundup/meta/issue589; - testing; - **blocker**) - `Notify the issue if a commit is made`_ (http://psf.upfronthosting.co.za/roundup/meta/issue590; review committal from Ezio Melotti; @@ -745,6 +741,8 @@ Required: (https://github.com/python/devguide/milestone/1) - Update commit hash detection on b.p.o to support 10- and 11-character hashes (http://psf.upfronthosting.co.za/roundup/meta/issue610) + - `Linking a pull request to an issue`_ + (http://psf.upfronthosting.co.za/roundup/meta/issue589) Optional features: From 4ba21969036b2d05a0cf65d1dc786ccc91ba0d6b Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Wed, 11 Jan 2017 12:30:36 +0100 Subject: [PATCH 22/36] PEP 540: add link to the implementation --- pep-0540.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/pep-0540.txt b/pep-0540.txt index 82fd024a8aa..771f32df729 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -689,7 +689,9 @@ PEPs: Main Python issues: -* `issue #28180: sys.getfilesystemencoding() should default to utf-8 +* `Issue #29240: Implementation of the PEP 540: Add a new UTF-8 mode + `_ +* `Issue #28180: sys.getfilesystemencoding() should default to utf-8 `_ * `Issue #19977: Use "surrogateescape" error handler for sys.stdin and sys.stdout on UNIX for the C locale From 1b6b889ed66202faef9da27fb6dc0365b5db9e0e Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Wed, 11 Jan 2017 22:08:40 +0100 Subject: [PATCH 23/36] PEP 540 * Strict mode doesn't use strict for OS data anymore: keep surrogateesscape, explain why in a new alternative * Define the priority between env vars and cmdline options to choose encodings and error handlers --- pep-0540.txt | 135 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 98 insertions(+), 37 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index 771f32df729..d52c2747d79 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -76,8 +76,10 @@ backward compatibility should be preserved whenever possible. Locale and operating system data -------------------------------- -Python uses the ``LC_CTYPE`` locale to decide how to encode and decode -data from/to the operating system: +.. _operating system data: + +Python uses an encoding called the "filesystem encoding" to decide how +to encode and decode data from/to the operating system: * file content * command line arguments: ``sys.argv`` @@ -91,10 +93,15 @@ data from/to the operating system: * etc. At startup, Python calls ``setlocale(LC_CTYPE, "")`` to use the user -``LC_CTYPE`` locale and then store the locale encoding, -``sys.getfilesystemencoding()``. In the whole lifetime of a Python process, -the same encoding and error handler are used to encode and decode data -from/to the operating system. +``LC_CTYPE`` locale and then store the locale encoding as the +"filesystem error". It's possible to get this encoding using +``sys.getfilesystemencoding()``. In the whole lifetime of a Python +process, the same encoding and error handler are used to encode and +decode data from/to the operating system. + +The ``os.fsdecode()`` and ``os.fsencode()`` functions can be used to +decode and encode operating system data. These functions use the +filesystem error handler: ``sys.getfilesystemencodeerrors()``. .. note:: In some corner case, the *current* ``LC_CTYPE`` locale must be used @@ -137,7 +144,7 @@ really uses the ASCII encoding if the the ``LC_CTYPE`` uses the the POSIX locale and ``nl_langinfo(CODESET)`` returns ``"ASCII"`` (or an alias to ASCII). If not (the effective encoding is not ASCII), Python uses its own ASCII codec instead of using ``mbstowcs()`` and -``wcstombs()`` functions for operating system data. +``wcstombs()`` functions for `operating system data`_. See the `POSIX locale (2016 Edition) `_. @@ -163,8 +170,8 @@ it by mistake. Examples: C.UTF-8 and C.utf8 locales -------------------------- -Some UNIX operating systems provide a variant of the POSIX locale using the -UTF-8 encoding: +Some UNIX operating systems provide a variant of the POSIX locale using +the UTF-8 encoding: * Fedora 25: ``"C.utf8"`` or ``"C.UTF-8"`` * Debian (eglibc 2.13-1, 2011), Ubuntu: ``"C.UTF-8"`` @@ -182,7 +189,7 @@ Popularity of the UTF-8 encoding Python 3 uses UTF-8 by default for Python source files. On Mac OS X, Windows and Android, Python always use UTF-8 for operating -system data. For Windows, see the PEP 529: "Change Windows filesystem +system data. For Windows, see the `PEP 529`_: "Change Windows filesystem encoding to UTF-8". On Linux, UTF-8 became the de facto standard encoding, @@ -215,15 +222,15 @@ filesystem with filenames encoded to ISO 8859-1. The Linux kernel and the libc don't decode filenames: a filename is used as a raw array of bytes. The common solution to support any filename is -to store filenames as bytes and don't try to decode them. When displayed to -stdout, mojibake is displayed if the filename and the terminal don't use -the same encoding. +to store filenames as bytes and don't try to decode them. When displayed +to stdout, mojibake is displayed if the filename and the terminal don't +use the same encoding. Python 3 promotes Unicode everywhere including filenames. A solution to support filenames not decodable from the locale encoding was found: the -``surrogateescape`` error handler (PEP 383), store undecodable bytes +``surrogateescape`` error handler (`PEP 383`_), store undecodable bytes as surrogate characters. This error handler is used by default for -operating system data, by ``os.fsdecode()`` and ``os.fsencode()`` for +`operating system data`_, by ``os.fsdecode()`` and ``os.fsencode()`` for example (except on Windows which uses the ``strict`` error handler). @@ -239,16 +246,17 @@ Unicode encode error when displaying non-ASCII text. It is especially useful when the POSIX locale is used, because this locale usually uses the ASCII encoding. -The problem is that operating system data like filenames are decoded -using the ``surrogateescape`` error handler (PEP 383). Displaying a +The problem is that `operating system data`_ like filenames are decoded +using the ``surrogateescape`` error handler (`PEP 383`_). Displaying a filename to stdout raises a Unicode encode error if the filename contains an undecoded byte stored as a surrogate character. Python 3.6 now uses ``surrogateescape`` for stdin and stdout if the -POSIX locale is used: `issue #19977 `_. The -idea is to passthrough operating system data even if it means mojibake, because -most UNIX applications work like that. Most UNIX applications store filenames -as bytes, usually simply because bytes are first-citizen class in the used +POSIX locale is used: `issue #19977 +`_. The idea is to passthrough +`operating system data`_ even if it means mojibake, because most UNIX +applications work like that. Most UNIX applications store filenames as +bytes, usually simply because bytes are first-citizen class in the used programming language, whereas Unicode is badly supported. .. note:: @@ -280,6 +288,10 @@ by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. +The ``-X utf8`` has the priority on the ``PYTHONUTF8`` environment +variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the +UTF-8 mode. + Encoding and error handler -------------------------- @@ -291,7 +303,7 @@ sys.stderr: Function Default UTF-8 or POSIX locale UTF-8 Strict ============================ ======================= ========================== ========================== open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict -os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8/strict** +os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape sys.stdin, sys.stdout locale/strict **UTF-8/surrogateescape** **UTF-8**/strict sys.stderr locale/backslashreplace **UTF-8**/backslashreplace **UTF-8**/backslashreplace ============================ ======================= ========================== ========================== @@ -311,11 +323,22 @@ The UTF-8 mode uses the ``surrogateescape`` error handler instead of the strict mode for convenience: the idea is that data not encoded to UTF-8 are passed through "Python" without being modified, as raw bytes. +The ``PYTHONIOENCODING`` environment variable has the priority on the +UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1 +python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr. + +Encodings used by ``open()``, highest priority first: + +* *encoding* and *errors* parameters (if set) +* UTF-8 mode +* os.device_encoding(fd) +* os.getpreferredencoding(False) + Rationale --------- The UTF-8 mode is disabled by default to keep hard Unicode errors when -encoding or decoding operating system data failed, and to keep the +encoding or decoding `operating system data`_ failed, and to keep the backward compatibility. The user is responsible to enable explicitly the UTF-8 mode, and so is better prepared for mojibake than if the UTF-8 mode would be enabled *by default*. @@ -325,7 +348,7 @@ UTF-8 where most applications speak UTF-8. It prevents Unicode errors if the user overrides a locale *by mistake* or if a Python program is started with no locale configured (and so with the POSIX locale). -Most UNIX applications handle operating system data as bytes, so +Most UNIX applications handle `operating system data`_ as bytes, so ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables have a limited impact on how these data are handled by the application. @@ -336,7 +359,8 @@ everywhere and that users *expect* UTF-8. Ignoring ``LC_ALL``, ``LC_CTYPE`` and ``LANG`` environment variables in Python is more convenient, since they are more commonly misconfigured *by mistake* (configured to use an encoding different than UTF-8, -whereas the system uses UTF-8), rather than being misconfigured by intent. +whereas the system uses UTF-8), rather than being misconfigured by +intent. Expected mojibake and surrogate character issues ------------------------------------------------ @@ -648,8 +672,9 @@ Don't modify the encoding of the POSIX locale A first version of the PEP did not change the encoding and error handler used of the POSIX locale. -The problem is that adding a command line option or setting an environment -variable is not possible in some cases, or at least not convenient. +The problem is that adding a command line option or setting an +environment variable is not possible in some cases, or at least not +convenient. Moreover, many users simply expect that Python 3 behaves as Python 2: don't bother them with encodings and "just works" in all cases. These @@ -660,12 +685,12 @@ complex documents using multiple incompatibles encodings. Always use UTF-8 ---------------- -Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. -Since UTF-8 became the de facto encoding, it makes sense to always use it on all -platforms with any locale. +Python already always use the UTF-8 encoding on Mac OS X, Android and +Windows. Since UTF-8 became the de facto encoding, it makes sense to +always use it on all platforms with any locale. -The risk is to introduce mojibake if the locale uses a different encoding, -especially for locales other than the POSIX locale. +The risk is to introduce mojibake if the locale uses a different +encoding, especially for locales other than the POSIX locale. Force UTF-8 for the POSIX locale @@ -674,8 +699,39 @@ Force UTF-8 for the POSIX locale An alternative to always using UTF-8 in any case is to only use UTF-8 when the ``LC_CTYPE`` locale is the POSIX locale. -The PEP 538 "Coercing the legacy C locale to C.UTF-8" of Nick Coghlan -proposes to implement that using the ``C.UTF-8`` locale. +The `PEP 538`_ "Coercing the legacy C locale to C.UTF-8" of Nick +Coghlan proposes to implement that using the ``C.UTF-8`` locale. + + +Use the strict error handler for operating system data +------------------------------------------------------ + +Using the ``surrogateescape`` error handler for `operating system data`_ +creates surprising surrogate characters. No Python codec (except of +``utf-7``) accept surrogates, and so encoding text coming from the +operating system is likely to raise an error error. The problem is that +the error comes late, very far from where the data was read. + +The ``strict`` error handler can be used instead to decode +(``os.fsdecode()``) and encode (``os.fsencode()``) operating system +data, to raise encoding errors as soon as possible. It helps to find +bugs more quickly. + +The main drawback of this strategy is that it doesn't work in practice. +Python 3 is designed on top on Unicode strings. Most functions expect +Unicode and produce Unicode. Even if many operating system functions +have two flavors, bytes and Unicode, the Unicode flavar is used is most +cases. There are good reasons for that: Unicode is more convenient in +Python 3 and using Unicode helps to support the full Unicode Character +Set (UCS) on Windows (even if Python now uses UTF-8 since Python 3.6, +see the `PEP 528`_ and the `PEP 529`_). + +For example, if ``os.fsdecode()`` uses ``utf8/strict``, +``os.listdir(str)`` fails to list filenames of a directory if a single +filename is not decodable from UTF-8. As a consequence, +``shutil.rmtree(str)`` fails to remove a directory. Undecodable +filenames, environment variables, etc. are simply too common to make +this alternative viable. Links @@ -683,9 +739,14 @@ Links PEPs: -* PEP 538 "Coercing the legacy C locale to C.UTF-8" -* PEP 529: "Change Windows filesystem encoding to UTF-8" -* PEP 383: "Non-decodable Bytes in System Character Interfaces" +* `PEP 538 `_: + "Coercing the legacy C locale to C.UTF-8" +* `PEP 529 `_: + "Change Windows filesystem encoding to UTF-8" +* `PEP 528 `_: + "Change Windows console encoding to UTF-8" +* `PEP 383 `_: + "Non-decodable Bytes in System Character Interfaces" Main Python issues: From 0e107f280cec26b7d213517f1523da84b5686f19 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Wed, 11 Jan 2017 22:32:24 +0100 Subject: [PATCH 24/36] PEP 540 Add examples for the "List a directory into stdout" use case. --- pep-0540.txt | 60 ++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 47 insertions(+), 13 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index d52c2747d79..56295d5c56f 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -278,17 +278,20 @@ handler, instead using the locale encoding (with ``strict`` or Basically, the UTF-8 mode behaves as Python 2: it "just works" and don't bother users with encodings, but it can produce mojibake. It can be configured as strict to prevent mojibake: the UTF-8 encoding is used -with the ``strict`` error handler in this case. +with the ``strict`` error handler for inputs and outputs, but the +``surrogateescape`` error handler is still used for `operating system +data`_. New ``-X utf8`` command line option and ``PYTHONUTF8`` environment variable are added to control the UTF-8 mode. The UTF-8 mode is enabled by ``-X utf8`` or ``PYTHONUTF8=1``. The UTF-8 is configured as strict -by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. +by ``-X utf8=strict`` or ``PYTHONUTF8=strict``. Other option values fail +with an error. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. -The ``-X utf8`` has the priority on the ``PYTHONUTF8`` environment +The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode. @@ -389,7 +392,7 @@ code. The UTF-8 mode can produce mojibake since Python and external code don't both of invalid bytes, but it's a deliberate choice. The UTF-8 mode can be configured as strict to prevent mojibake and be fail early when data -is not decodable from UTF-8. +is not decodable from UTF-8 or not encodable to UTF-8. External code using text ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -441,6 +444,38 @@ To be able to always work, the program must be able to produce mojibake. Mojibake is more user friendly than an error with a truncated or empty output. +Example with a directory which contains the file called ``b'xxx\xff'`` +(the byte ``0xFF`` is invalid in UTF-8). + +Default and UTF-8 Strict mode fail on ``print()`` with an encode error:: + + $ python3.7 ../ls.py + Traceback (most recent call last): + File "../ls.py", line 5, in + print(name) + UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... + + $ python3.7 -X utf8=strict ../ls.py + Traceback (most recent call last): + File "../ls.py", line 5, in + print(name) + UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' ... + +The UTF-8 mode, POSIX locale, Python 2 and the UNIX ``ls`` command work +but display mojibake:: + + $ python3.7 -X utf8 ../ls.py + xxx� + + $ LC_ALL=C /python3.6 ../ls.py + xxx� + + $ python2 ../ls.py + xxx� + + $ ls + 'xxx'$'\377' + List a directory into a text file --------------------------------- @@ -647,9 +682,9 @@ Backward Compatibility ====================== The main backward incompatible change is that the UTF-8 encoding is now -used if the locale is POSIX. Since the UTF-8 encoding is used with the -``surrogateescape`` error handler, ecoding errors should not occur and -so the change should not break applications. +used by default if the locale is POSIX. Since the UTF-8 encoding is used +with the ``surrogateescape`` error handler, encoding errors should not +occur and so the change should not break applications. The more likely source of trouble comes from external libraries. Python can decode successfully data from UTF-8, but a library using the locale @@ -658,9 +693,8 @@ encoding text in a library is a rare operation. Very few libraries expect text, most libraries expect bytes and even manipulate bytes internally. -If the locale is not POSIX, the PEP has no impact on the backward -compatibility since the UTF-8 mode is disabled by default in this case, -it must be enabled explicitly. +The PEP only changes the default behaviour if the locale is POSIX. For +other locales, the *default* behaviour is unchanged. Alternatives @@ -672,9 +706,9 @@ Don't modify the encoding of the POSIX locale A first version of the PEP did not change the encoding and error handler used of the POSIX locale. -The problem is that adding a command line option or setting an -environment variable is not possible in some cases, or at least not -convenient. +The problem is that adding the ``-X utf8`` command line option or +setting the ``PYTHONUTF8`` environment variable is not possible in some +cases, or at least not convenient. Moreover, many users simply expect that Python 3 behaves as Python 2: don't bother them with encodings and "just works" in all cases. These From dc6b4a07f414a610cf1f051c545fa2877d61df0e Mon Sep 17 00:00:00 2001 From: Lukasz Langa Date: Thu, 12 Jan 2017 00:34:26 -0800 Subject: [PATCH 25/36] PEP 541: Package Index Name Retention Work in progress. Number placeholder and problem statement. --- pep-0541.txt | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 pep-0541.txt diff --git a/pep-0541.txt b/pep-0541.txt new file mode 100644 index 00000000000..f84332e299d --- /dev/null +++ b/pep-0541.txt @@ -0,0 +1,96 @@ +PEP: 541 +Title: Package Index Name Retention +Version: $Revision$ +Last-Modified: $Date$ +Author: Łukasz Langa +Status: Draft +Type: Process +Content-Type: text/x-rst +Created: 12-January-2017 + + +Abstract +======== + +This PEP proposes an extension to the Terms of Use [1]_ of the Package +Index [2]_, clarifying expectations of package owners regarding +ownership of a package name on the Package Index, specifically with +regards to conflict resolution. + +Existing package repositories such as CPAN [3]_, NPM [4]_, and +GitHub [5]_ will be investigated as prior art in this field. + + +Rationale +========= + +Given that package names on the Index are sharing a single flat +namespace, a unique name is a finite resource. + + +Specification +============= + +TBD. + + +Implementation +============== + +TBD. + + +Rejected Proposals +================== + +The original approach was to hope for the best and solve issues as they +arise without written policy. This is not sustainable. The lack of +generally available guidelines in writing on package name conflict +resolution is causing unnecessary tensions. From the perspective of +users, decisions made by the Package Index maintainers without written +guidelines may appear arbitrary. From the perspective of the Package +Index maintainers, solving name conflicts is a stressful task due to +risk of unintentional harm due to lack of defined policy. + +TBD. + + +References +========== + +.. [1] Terms of Use of the Python Package Index + (https://pypi.org/policy/terms-of-use/) + +.. [2] The Python Package Index + (https://pypi.python.org/) + +.. [3] The Comprehensive Perl Archive Network + (http://www.cpan.org/) + +.. [4] Node Package Manager + (https://www.npmjs.com/package/left-pad) + +.. [5] GitHub + (https://github.com/) + + +Copyright +========= + +This document has been placed in the public domain. + + +Acknowledgements +================ + +The many participants of the Distutils and Catalog SIGs for their +ideas over the years. + + +.. + Local Variables: + mode: indented-text + indent-tabs-mode: nil + sentence-end-double-space: t + fill-column: 70 + End: From b9a2a993fe95e3f1f754179bfaab427f165a21c3 Mon Sep 17 00:00:00 2001 From: Victor Stinner Date: Thu, 12 Jan 2017 13:26:21 +0100 Subject: [PATCH 26/36] Update 540 for Windows Describe encodings and error handlers used on Windows and the priority of PYTHONLEGACYWINDOWSFSENCODING. --- pep-0540.txt | 69 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 59 insertions(+), 10 deletions(-) diff --git a/pep-0540.txt b/pep-0540.txt index 56295d5c56f..a0f6c937ecc 100644 --- a/pep-0540.txt +++ b/pep-0540.txt @@ -291,9 +291,24 @@ with an error. The POSIX locale enables the UTF-8 mode. In this case, the UTF-8 mode can be explicitly disabled by ``-X utf8=0`` or ``PYTHONUTF8=0``. -The ``-X utf8`` has the priority over the ``PYTHONUTF8`` environment -variable. For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the -UTF-8 mode. +Options priority for the UTF-8 mode: + +* ``PYTHONLEGACYWINDOWSFSENCODING`` +* ``-X utf8`` +* ``PYTHONUTF8`` +* POSIX locale + +For example, ``PYTHONUTF8=0 python3 -X utf8`` enables the UTF-8 mode, +whereas ``LC_ALL=C python3.7 -X utf8=0`` disables the UTF-8 mode and so +use the encoding of the POSIX locale. + +Encodings used by ``open()``, highest priority first: + +* *encoding* and *errors* parameters (if set) +* UTF-8 mode +* os.device_encoding(fd) +* os.getpreferredencoding(False) + Encoding and error handler -------------------------- @@ -303,7 +318,7 @@ open(), os.fsdecode(), os.fsencode(), sys.stdin, sys.stdout and sys.stderr: ============================ ======================= ========================== ========================== -Function Default UTF-8 or POSIX locale UTF-8 Strict +Function Default UTF-8 mode or POSIX locale UTF-8 Strict mode ============================ ======================= ========================== ========================== open() locale/strict **UTF-8/surrogateescape** **UTF-8**/strict os.fsdecode(), os.fsencode() locale/surrogateescape **UTF-8**/surrogateescape **UTF-8**/surrogateescape @@ -326,16 +341,50 @@ The UTF-8 mode uses the ``surrogateescape`` error handler instead of the strict mode for convenience: the idea is that data not encoded to UTF-8 are passed through "Python" without being modified, as raw bytes. -The ``PYTHONIOENCODING`` environment variable has the priority on the +The ``PYTHONIOENCODING`` environment variable has the priority over the UTF-8 mode for standard streams. For example, ``PYTHONIOENCODING=latin1 python3 -X utf8`` uses the Latin1 encoding for stdin, stdout and stderr. -Encodings used by ``open()``, highest priority first: +Encoding and error handler on Windows +------------------------------------- + +On Windows, the encodings and error handlers are different: + +============================ ======================= ========================== ========================== ========================== +Function Default Legacy Windows FS encoding UTF-8 mode UTF-8 Strict mode +============================ ======================= ========================== ========================== ========================== +open() mbcs/strict mbcs/strict **UTF-8/surrogateescape** **UTF-8**/strict +os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** UTF-8/surrogatepass UTF-8/surrogatepass +sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape UTF-8/surrogateescape **UTF-8/strict** +sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace UTF-8/backslashreplace +============================ ======================= ========================== ========================== ========================== + +By comparison, Python 3.6 uses: + +============================ ======================= ========================== +Function Default Legacy Windows FS encoding +============================ ======================= ========================== +open() mbcs/strict mbcs/strict +os.fsdecode(), os.fsencode() UTF-8/surrogatepass **mbcs/replace** +sys.stdin, sys.stdout UTF-8/surrogateescape UTF-8/surrogateescape +sys.stderr UTF-8/backslashreplace UTF-8/backslashreplace +============================ ======================= ========================== + +The "Legacy Windows FS encoding" is enabled by setting the +``PYTHONLEGACYWINDOWSFSENCODING`` environment variable to ``1``, see the +`PEP 529`. + +Enabling the legacy Windows filesystem encoding disables the UTF-8 mode +(as ``-X utf8=0``). + +If stdin and/or stdout is redirected to a pipe, sys.stdin and/or +sys.output uses ``mbcs`` encoding by default, rather than UTF-8. But +with the UTF-8 mode, sys.stdin and sys.stdout always use the UTF-8 +encoding. + +There is no POSIX locale on Windows. The ANSI code page is used to the +locale encoding, and this code page never uses the ASCII encoding. -* *encoding* and *errors* parameters (if set) -* UTF-8 mode -* os.device_encoding(fd) -* os.getpreferredencoding(False) Rationale --------- From 3f3304ff972010c7585badd74e8a4f66775e4a4a Mon Sep 17 00:00:00 2001 From: Lukasz Langa Date: Thu, 12 Jan 2017 14:26:46 -0800 Subject: [PATCH 27/36] First complete draft of content. --- pep-0541.txt | 189 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 184 insertions(+), 5 deletions(-) diff --git a/pep-0541.txt b/pep-0541.txt index f84332e299d..9c6276184a9 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -25,19 +25,194 @@ Rationale ========= Given that package names on the Index are sharing a single flat -namespace, a unique name is a finite resource. +namespace, a unique name is a finite resource. The growing age of +the Package Index causes a constant rise of situations of conflict +between the current use of the name and a different suggested use of +the same name. + +This document aims to provide general guidelines for solving the +most typical cases of such conflicts. Specification ============= -TBD. +The main idea behind this document is that the Package Index serves the +community. Every user is invited to upload content to the Package Index +under the Terms of Use, understanding that it is at the sole risk of +the user. + +While the Package Index is not a backup service, the maintainers of the +Package Index do their best to keep that content accessible indefinitely +in its published form. However, in certain edge cases the greater +community's needs might overweigh the individual's expectation of +ownership of a package name. + +The use cases covered by this document are: + +* Abandoned projects: + + * continued maintenance by a different set of users; or + * removal from the Index for use with a different project. + +* Active projects: + + * resolving disputes over a name. + +* Invalid projects. Implementation ============== -TBD. +Reachability +------------ + +The user of the Package Index is solely responsible for being reachable +by the Package Index maintainers for matters concerning projects that +the user owns. In every case where contacting the user is necessary, +the maintainers will try to do so at least three times, using the +following means of contact: + +* the e-mail address on file in the user's profile on the Package Index; +* the e-mail address listed in the Author field for a given project + uploaded to the Index; and +* any e-mail addresses found in the given project's documentation + on the Index or on the listed Home Page. + +The maintainers stop trying to reach the user after six weeks. + + +Abandoned projects +------------------ + +A project is considered *abandoned* when ALL of the following are met: + +* owner not reachable (see Reachability above); +* no releases within the past twelve months; and +* no activity from the owner on the project's home page (or no + home page listed). + +All other projects are considered *active*. + + +Continued maintenance of an abandoned project +--------------------------------------------- + +If a candidate appears willing to continue maintenance on an *abandoned* +project, ownership of the name is transferred when ALL of the following +are met: + +* the project has been determined *abandoned* by the rules described + above; +* the candidate is able to prove failed attempts to contact the + existing owner; +* the candidate is able to prove skin in the game (improvements made + on the candidate's own fork of the project); +* the candidate is able to prove why a fork under a different name is + not an acceptable workaround; and +* the maintainers of the Package Index don't have any additional + reservations. + + +Removal of an abandoned project +------------------------------- + +Projects are never removed from the Package Index solely on the basis +of abandonment. Artifacts uploaded to the Package Index hold inherent +historical value. + +An *abandoned* project can be transferred to a new owner for purposes +of reusing the name when ALL of the following are met: + +* the project has been determined *abandoned* by the rules described + above; +* the candidate is able to prove failed attempts to contact the + existing owner; +* the candidate is able to prove skin in the game (the other project + suggested to reuse the name already exists and meets notability + requirements); +* the candidate is able to prove why a fork under a different name is + not an acceptable workaround; +* download statistics on the Package Index for the existing package + indicate project is not being used; and +* the maintainers of the Package Index don't have any additional + reservations. + + +Name conflict resolution for active projects +-------------------------------------------- + +The maintainers of the Package Index are not arbiters in disputes +around *active* projects. There are many possible scenarios here, +a non-exclusive list describing some real-world examples is presented +below. None of the following qualify for package name ownership +transfer: + +1. User A and User B share project X. After some time they part ways + and each of them wants to continue the project under name X. +2. User A owns a project X outside the Package Index. User B creates + a package under the name X on the Index. After some time, User A + wants to publish project X on the Index but realizes name is taken. + This is true even if User A's project X gains notability and the + User B's project X is not notable. +3. User A publishes project X to the Package Index. After some time + User B proposes bug fixes to the project but no new release is + published by User A. This is true even if User A agrees to publish + a new version and later doesn't, even if User B's changes are merged + to the source code repository for project X. + +Again, the list above is not exclusive. The maintainers of the Package +Index recommend users to get in touch with each other and solve the +issue by respectful communication (see the PSF Code of Conduct [6]_). + + +Invalid projects +---------------- + +A project published on the Package Index meeting ANY of the following +is considered invalid and will be removed from the Index: + +* project does not conform to Terms of Use; +* project is malware (designed to exploit or harm systems or users); +* project contains illegal content; +* project violates copyright or licenses; +* project is name squatting (package has no functionality or is + empty); +* project name, description, or content violates the Code of Conduct; + or +* project is abusing the Package Index for purposes it was not + intended. + +If you find a project that might be considered invalid, create +a support request [7]_. + + +Prior art +========= + +NPM contains a separate section linked from the front page called +`Package Name Disputes `_. +It is described as a "living document", as of January 2017 its +contents might be summarized as follows: + +* package name squatting is prohibited; +* users wanting to reuse a project name are required to contact the + existing author, with cc to support@npmjs.com; +* all contact must conform to the NPM Code of Conduct; +* in case of no resolution after a few weeks, npm inc. holds the right + to the final decision in the matter. + +CPAN lets any user upload modules with the same name. PAUSE, a related +index, only lists modules uploaded by the primary maintainer or listed +co-maintainers. CPAN documentation doesn't address disputes otherwise. + +GitHub's terms of service contain an exhaustive list of behavior +not meeting general conditions of use. While not codified anywhere, +GitHub does agree for users to reclaim abandoned account names by +archiving the abandoned account and letting the other user or +organization rename their account. This is done on a case-by-case +basis. Rejected Proposals @@ -52,8 +227,6 @@ guidelines may appear arbitrary. From the perspective of the Package Index maintainers, solving name conflicts is a stressful task due to risk of unintentional harm due to lack of defined policy. -TBD. - References ========== @@ -73,6 +246,12 @@ References .. [5] GitHub (https://github.com/) +.. [6] Python Community Code of Conduct + (https://www.python.org/psf/codeofconduct/) + +.. [7] PyPI Support Requests + (https://sourceforge.net/p/pypi/support-requests/) + Copyright ========= From aed99f80cf2732c674f20f3f896af9515fc818ba Mon Sep 17 00:00:00 2001 From: Lukasz Langa Date: Thu, 12 Jan 2017 14:39:55 -0800 Subject: [PATCH 28/36] s/prove/demonstrate/ --- pep-0541.txt | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/pep-0541.txt b/pep-0541.txt index 9c6276184a9..c30eeff1fc9 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -105,12 +105,12 @@ are met: * the project has been determined *abandoned* by the rules described above; -* the candidate is able to prove failed attempts to contact the - existing owner; -* the candidate is able to prove skin in the game (improvements made - on the candidate's own fork of the project); -* the candidate is able to prove why a fork under a different name is - not an acceptable workaround; and +* the candidate is able to demonstrate own failed attempts to contact + the existing owner; +* the candidate is able to demonstrate skin in the game (improvements + made on the candidate's own fork of the project); +* the candidate is able to demonstrate why a fork under a different name + is not an acceptable workaround; and * the maintainers of the Package Index don't have any additional reservations. @@ -127,13 +127,13 @@ of reusing the name when ALL of the following are met: * the project has been determined *abandoned* by the rules described above; -* the candidate is able to prove failed attempts to contact the - existing owner; -* the candidate is able to prove skin in the game (the other project - suggested to reuse the name already exists and meets notability - requirements); -* the candidate is able to prove why a fork under a different name is - not an acceptable workaround; +* the candidate is able to demonstrate own failed attempts to contact + the existing owner; +* the candidate is able to demonstrate skin in the game (the other + project suggested to reuse the name already exists and meets + notability requirements); +* the candidate is able to demonstrate why a fork under a different name + is not an acceptable workaround; and * download statistics on the Package Index for the existing package indicate project is not being used; and * the maintainers of the Package Index don't have any additional From d1d519218f8cd09920f63f344ee8fac385d664b4 Mon Sep 17 00:00:00 2001 From: Lukasz Langa Date: Thu, 12 Jan 2017 14:40:41 -0800 Subject: [PATCH 29/36] Remove spurious "and" --- pep-0541.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0541.txt b/pep-0541.txt index c30eeff1fc9..b1bf129dae1 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -133,7 +133,7 @@ of reusing the name when ALL of the following are met: project suggested to reuse the name already exists and meets notability requirements); * the candidate is able to demonstrate why a fork under a different name - is not an acceptable workaround; and + is not an acceptable workaround; * download statistics on the Package Index for the existing package indicate project is not being used; and * the maintainers of the Package Index don't have any additional From 58da4863a78cb145567cad1e5b203d0f2086cbff Mon Sep 17 00:00:00 2001 From: Lukasz Langa Date: Thu, 12 Jan 2017 15:00:59 -0800 Subject: [PATCH 30/36] Clarify project reassigning under reachable owner --- pep-0541.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/pep-0541.txt b/pep-0541.txt index b1bf129dae1..b4d56e8b382 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -114,6 +114,9 @@ are met: * the maintainers of the Package Index don't have any additional reservations. +Under no circumstances will a name be reassigned against the wishes of +a reachable owner. + Removal of an abandoned project ------------------------------- From 27b2cd11cd3a7852e40b9e8582f41e03196e8f30 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Sat, 14 Jan 2017 10:45:59 -0800 Subject: [PATCH 31/36] [pep541] Specify where the ToU extension should go. --- pep-0541.txt | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/pep-0541.txt b/pep-0541.txt index b4d56e8b382..68a0c703233 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -61,6 +61,11 @@ The use cases covered by this document are: * Invalid projects. +The proposed extension to the Terms of Use, as expressed in the +Implementation section, will be published as a separate document on the +Package Index, linked next to existing Terms of Use in the front page +footer. + Implementation ============== From 595d7b67ae8690e7ee2a75e9a6f4bd88f225a36f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Sat, 14 Jan 2017 10:51:07 -0800 Subject: [PATCH 32/36] [pep541] Remove skin from the game (slang) --- pep-0541.txt | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/pep-0541.txt b/pep-0541.txt index 68a0c703233..ec1991ec82d 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -112,8 +112,8 @@ are met: above; * the candidate is able to demonstrate own failed attempts to contact the existing owner; -* the candidate is able to demonstrate skin in the game (improvements - made on the candidate's own fork of the project); +* the candidate is able to demonstrate improvements made on the + candidate's own fork of the project; * the candidate is able to demonstrate why a fork under a different name is not an acceptable workaround; and * the maintainers of the Package Index don't have any additional @@ -137,9 +137,8 @@ of reusing the name when ALL of the following are met: above; * the candidate is able to demonstrate own failed attempts to contact the existing owner; -* the candidate is able to demonstrate skin in the game (the other - project suggested to reuse the name already exists and meets - notability requirements); +* the candidate is able to demonstrate that the project suggested to + reuse the name already exists and meets notability requirements; * the candidate is able to demonstrate why a fork under a different name is not an acceptable workaround; * download statistics on the Package Index for the existing package From 8e9969eb244042cf8fcf8adda7f895b7e2afc912 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Sat, 14 Jan 2017 11:37:35 -0800 Subject: [PATCH 33/36] [pep541] Cover legal matters --- pep-0541.txt | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/pep-0541.txt b/pep-0541.txt index ec1991ec82d..916805c62c4 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -183,7 +183,7 @@ is considered invalid and will be removed from the Index: * project does not conform to Terms of Use; * project is malware (designed to exploit or harm systems or users); * project contains illegal content; -* project violates copyright or licenses; +* project violates copyright, trademarks, patents, or licenses; * project is name squatting (package has no functionality or is empty); * project name, description, or content violates the Code of Conduct; @@ -191,8 +191,30 @@ is considered invalid and will be removed from the Index: * project is abusing the Package Index for purposes it was not intended. -If you find a project that might be considered invalid, create -a support request [7]_. +The Package Index maintainers pre-emptively declare certain package +names as unavailable for security reasons. + +If you find a project that you think might be considered invalid, create +a support request [7]_. Maintainers of the Package Index will review +the case. + + +The role of the Python Software Foundation +------------------------------------------ + +The Python Software Foundation [8]_ is the non-profit legal entity that +provides the Package Index as a community service. + +The Package Index maintainers can escalate issues covered by this +document for resolution by the PSF Board if the matter is not clear +enough. Some decisions *require* additional judgement by the Board, +especially in cases of Code of Conduct violations or legal claims. +Decisions made by the Board are published as Resolutions [9]_. + +The Board has the final say in any disputes covered by this document and +can decide to reassign or remove a project from the Package Index after +careful consideration even when not all requirements listed +here are met. Prior art @@ -259,6 +281,12 @@ References .. [7] PyPI Support Requests (https://sourceforge.net/p/pypi/support-requests/) +.. [8] Python Software Foundation + (https://www.python.org/psf/) + +.. [9] PSF Board Resolutions + (https://www.python.org/psf/records/board/resolutions/) + Copyright ========= From e4c14d04c3bf35d4992f1f6cfc698e9b5e088dc8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C5=81ukasz=20Langa?= Date: Sat, 14 Jan 2017 11:38:41 -0800 Subject: [PATCH 34/36] [pep541] Punctuation. --- pep-0541.txt | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pep-0541.txt b/pep-0541.txt index 916805c62c4..cd367d96af1 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -44,7 +44,7 @@ the user. While the Package Index is not a backup service, the maintainers of the Package Index do their best to keep that content accessible indefinitely -in its published form. However, in certain edge cases the greater +in its published form. However, in certain edge cases the greater community's needs might overweigh the individual's expectation of ownership of a package name. @@ -153,10 +153,10 @@ Name conflict resolution for active projects The maintainers of the Package Index are not arbiters in disputes around *active* projects. There are many possible scenarios here, a non-exclusive list describing some real-world examples is presented -below. None of the following qualify for package name ownership +below. None of the following qualify for package name ownership transfer: -1. User A and User B share project X. After some time they part ways +1. User A and User B share project X. After some time they part ways and each of them wants to continue the project under name X. 2. User A owns a project X outside the Package Index. User B creates a package under the name X on the Index. After some time, User A @@ -232,7 +232,7 @@ contents might be summarized as follows: * in case of no resolution after a few weeks, npm inc. holds the right to the final decision in the matter. -CPAN lets any user upload modules with the same name. PAUSE, a related +CPAN lets any user upload modules with the same name. PAUSE, a related index, only lists modules uploaded by the primary maintainer or listed co-maintainers. CPAN documentation doesn't address disputes otherwise. From aa29454969d4f64451faec3d38709d591eec7aba Mon Sep 17 00:00:00 2001 From: Yuri Broze Date: Mon, 16 Jan 2017 12:25:59 -0500 Subject: [PATCH 35/36] Incredibly minor typo fix in pep-0001 (#181) "reference" -> "references" --- pep-0001.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0001.txt b/pep-0001.txt index 3f54e016b00..3a6903630e2 100644 --- a/pep-0001.txt +++ b/pep-0001.txt @@ -68,7 +68,7 @@ PEP Workflow Python's BDFL ------------- -There are several reference in this PEP to the "BDFL". This acronym stands +There are several references in this PEP to the "BDFL". This acronym stands for "Benevolent Dictator for Life" and refers to Guido van Rossum, the original creator of, and the final design authority for, the Python programming language. From 8c3faaf5f9e10ee586ae2fc4a10a0d0bc29bc0dc Mon Sep 17 00:00:00 2001 From: Donald Stufft Date: Mon, 16 Jan 2017 14:45:55 -0500 Subject: [PATCH 36/36] Add BDFL-Delegate and Discusssions-To header to PEP 541 --- pep-0541.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pep-0541.txt b/pep-0541.txt index cd367d96af1..c2f2c976f40 100644 --- a/pep-0541.txt +++ b/pep-0541.txt @@ -3,6 +3,8 @@ Title: Package Index Name Retention Version: $Revision$ Last-Modified: $Date$ Author: Łukasz Langa +BDFL-Delegate: Donald Stufft +Discussions-To: distutils-sig Status: Draft Type: Process Content-Type: text/x-rst