|
| 1 | +--- |
| 2 | +title: Python's Preprocessor |
| 3 | +date: 2024-06-10T23:36:29+00:00 |
| 4 | +categories: [Python] |
| 5 | +tags: [Python, Hackery] |
| 6 | +author: Che |
| 7 | +bokeh: true |
| 8 | +--- |
| 9 | + |
| 10 | +Every now and the you hear outrageous claims such as "Python has no preprocessor". This is simply not true. In fact, Python has the best preprocessor of all languages - it quite literally allows us to do whatever we want, and a lot more. It's just a little tricky to (ab)use. |
| 11 | + |
| 12 | +# Python Source Code Encoding |
| 13 | +According to [PEP-0263](https://peps.python.org/pep-0263/#defining-the-encoding) it is possible to define a source code encoding by placing a magic comment in one of the first 2 lines. The following lines would all be interpreted as setting the encoding to `utf8`: |
| 14 | +```py |
| 15 | +# coding=utf8 |
| 16 | +# -*- coding: utf8 -*- |
| 17 | +# vim: set fileencoding=utf8 : |
| 18 | +``` |
| 19 | + |
| 20 | +To be precise, the line must match the regular expression `^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)`. Naturally we can use our own encodings, but their names must match `[-_.a-zA-Z0-9]+`. |
| 21 | + |
| 22 | +# Defining Custom Codecs |
| 23 | +https://docs.python.org/3/library/codecs.html |
| 24 | + |
| 25 | +# Extending Python |
| 26 | +## Unary increment and decrement |
| 27 | +https://github.com/dankeyy/incdec.py |
| 28 | + |
| 29 | +In Python the postfix operators `x++` can be written as `((x, x := x+1)[0])`, `x--` is therefore `((x, x := x-1)[0])`. Using the same expression we can write the prefix operators `++x` as `((x, x := x+1)[1])` and `--x` as `((x, x := x-1)[1])` respectively. This expression works by pulling out the respective element from a tuple of itself and itself updated. |
| 30 | + |
| 31 | +Other than that it's simply text replacement. |
| 32 | + |
| 33 | +# Polyglotting |
| 34 | + |
| 35 | +Instead of expanding Python, why not teach the Python interpreter a few more tricks? After all there's all kinds of cool languages it could interpret! |
| 36 | + |
| 37 | +## C and C++ |
| 38 | +The easiest way to smuggle the magic line into C and C++ sources is by defining a macro like |
| 39 | +```c |
| 40 | +#define CODEC "coding:pydong" |
| 41 | +``` |
| 42 | +Great, we can now trigger the `pydong` decoder with a valid C or C++ source file. To actually get the Python interpreter to interpret this C or C++ code for us, we can use the excellent package `cppyy`. In essence `cppyy` uses `cling` under the hood to interpret our code and generates Python bindings for us to use it. |
| 43 | +
|
| 44 | +After our decoder is done with the input file, the output should look something like |
| 45 | +```py |
| 46 | +import cppyy |
| 47 | +
|
| 48 | +# interpret the input source code |
| 49 | +cppyy.cppdef("<input source file content>") |
| 50 | +
|
| 51 | +# find the main function |
| 52 | +from cppyy.gbl import main |
| 53 | +
|
| 54 | +if __name__ == "__main__": |
| 55 | + # call C/C++ main |
| 56 | + main() |
| 57 | +``` |
| 58 | + |
| 59 | +Now we can run `python foo.cpp` if `foo.cpp` begins with the magic line `#define CODEC "coding:pydong"` or similar. |
| 60 | + |
| 61 | +## Shell script |
| 62 | +Shell script comments start with `#`, hence we don't need to do anything special for the magic line. |
| 63 | +
|
| 64 | +## CMake |
| 65 | +Just like Shell scripts, CMake uses `#` for comments. |
| 66 | +
|
| 67 | +## PHP |
| 68 | +PHP allows `#` comments, for example for the shebang. |
| 69 | +
|
| 70 | +## Ruby |
| 71 | +Ruby uses `#` for single-line comments. This means just like PHP, we can simply use a comment for the magic line. |
0 commit comments