Skip to content

Commit 4a1ec59

Browse files
committed
start writing python preprocessor post
1 parent 5f32d65 commit 4a1ec59

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: Python's Preprocessor
3+
date: 2024-06-10T23:36:29+00:00
4+
categories: [Python]
5+
tags: [Python, Hackery]
6+
author: Che
7+
bokeh: true
8+
---
9+
10+
Every now and the you hear outrageous claims such as "Python has no preprocessor". This is simply not true. In fact, Python has the best preprocessor of all languages - it quite literally allows us to do whatever we want, and a lot more. It's just a little tricky to (ab)use.
11+
12+
# Python Source Code Encoding
13+
According to [PEP-0263](https://peps.python.org/pep-0263/#defining-the-encoding) it is possible to define a source code encoding by placing a magic comment in one of the first 2 lines. The following lines would all be interpreted as setting the encoding to `utf8`:
14+
```py
15+
# coding=utf8
16+
# -*- coding: utf8 -*-
17+
# vim: set fileencoding=utf8 :
18+
```
19+
20+
To be precise, the line must match the regular expression `^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)`. Naturally we can use our own encodings, but their names must match `[-_.a-zA-Z0-9]+`.
21+
22+
# Defining Custom Codecs
23+
https://docs.python.org/3/library/codecs.html
24+
25+
# Extending Python
26+
## Unary increment and decrement
27+
https://github.com/dankeyy/incdec.py
28+
29+
In Python the postfix operators `x++` can be written as `((x, x := x+1)[0])`, `x--` is therefore `((x, x := x-1)[0])`. Using the same expression we can write the prefix operators `++x` as `((x, x := x+1)[1])` and `--x` as `((x, x := x-1)[1])` respectively. This expression works by pulling out the respective element from a tuple of itself and itself updated.
30+
31+
Other than that it's simply text replacement.
32+
33+
# Polyglotting
34+
35+
Instead of expanding Python, why not teach the Python interpreter a few more tricks? After all there's all kinds of cool languages it could interpret!
36+
37+
## C and C++
38+
The easiest way to smuggle the magic line into C and C++ sources is by defining a macro like
39+
```c
40+
#define CODEC "coding:pydong"
41+
```
42+
Great, we can now trigger the `pydong` decoder with a valid C or C++ source file. To actually get the Python interpreter to interpret this C or C++ code for us, we can use the excellent package `cppyy`. In essence `cppyy` uses `cling` under the hood to interpret our code and generates Python bindings for us to use it.
43+
44+
After our decoder is done with the input file, the output should look something like
45+
```py
46+
import cppyy
47+
48+
# interpret the input source code
49+
cppyy.cppdef("<input source file content>")
50+
51+
# find the main function
52+
from cppyy.gbl import main
53+
54+
if __name__ == "__main__":
55+
# call C/C++ main
56+
main()
57+
```
58+
59+
Now we can run `python foo.cpp` if `foo.cpp` begins with the magic line `#define CODEC "coding:pydong"` or similar.
60+
61+
## Shell script
62+
Shell script comments start with `#`, hence we don't need to do anything special for the magic line.
63+
64+
## CMake
65+
Just like Shell scripts, CMake uses `#` for comments.
66+
67+
## PHP
68+
PHP allows `#` comments, for example for the shebang.
69+
70+
## Ruby
71+
Ruby uses `#` for single-line comments. This means just like PHP, we can simply use a comment for the magic line.

0 commit comments

Comments
 (0)