gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds #126607

mpage · 2024-11-09T00:59:56Z

Enable specialization of LOAD_GLOBAL in free-threaded builds.

Thread-safety of specialization in free-threaded builds is provided by the following:

A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization.
Generation of new keys versions is made atomic in free-threaded builds.
Existing helpers are used to atomically modify the opcode.

Thread-safety of specialized instructions in free-threaded builds is provided by the following:

Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards.
Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset.
The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.

Performance-wise, this looks like a 3-4% improvement on free-threaded builds on pyperformance.

Issue: Make the specializing interpreter thread-safe in --disable-gil builds #115999

Thread-safety of specialization in free-threaded builds: - A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization. - Generation of new keys versions is made atomic in free-threaded builds. - We use helpers safely modify the bytecode. Thread-safety of specialized instructions in free-threaded builds: - Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reaused for a different key. - The dictionary read fast-path is used to read values from the dictionary. - Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions may be read without holding the dictionary's per-object lock while the instructions are executing.

This handles the case where another thread is instrumenting the bytecode.

Fidget-Spinner

This was less complicated than I initially anticipated. Nice!

LGTM overall, but I'll withold from approval to let others review it first.

colesbury

Nice! A few comments below.

Objects/dictobject.c

Python/specialize.c

Python/bytecodes.c

Keys are freed using QSBR. Nothing occurs in between loading the keys and loading the value that would cause the QSBR version to advance, ensuring that the keys object cannot be freed and reused.

Objects/dictobject.c

Include/internal/pycore_dict.h

mpage · 2024-11-21T00:54:34Z

!buildbot nogil refleak

bedevere-bot · 2024-11-21T00:54:37Z

🤖 New build scheduled with the buildbot fleet by @mpage for commit 01f4143 🤖

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

AMD64 Fedora Rawhide NoGIL refleaks PR
PPC64LE Fedora Rawhide NoGIL refleaks PR
aarch64 Fedora Rawhide NoGIL refleaks PR
AMD64 CentOS9 NoGIL Refleaks PR

…thon#126607) Enable specialization of LOAD_GLOBAL in free-threaded builds. Thread-safety of specialization in free-threaded builds is provided by the following: A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization. Generation of new keys versions is made atomic in free-threaded builds. Existing helpers are used to atomically modify the opcode. Thread-safety of specialized instructions in free-threaded builds is provided by the following: Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions are read without holding the dictionary's per-object lock in version guards. Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reused for a different key. Once the version guard passes, we know that we are reading from the correct offset. The dictionary read fast-path is used to read values from the dictionary once we know the correct offset.

mpage added 6 commits November 8, 2024 12:47

_Py_GetExecutor needs a CS in free-threaded builds

5af633c

This handles the case where another thread is instrumenting the bytecode.

Undo workaround for cases generator bug

7b3df6f

Fix unused function warning

8f6239d

Double check that keys are still valid

49ec70a

Fix formatting

0e50451

bedevere-app bot mentioned this pull request Nov 9, 2024

Make the specializing interpreter thread-safe in --disable-gil builds #115999

Closed

mpage added the skip news label Nov 9, 2024

mpage added 2 commits November 8, 2024 17:07

Fix linkage

72519ca

_Py_TryIncrefCompare can't escape

d55397b

mpage marked this pull request as ready for review November 9, 2024 02:00

mpage requested review from markshannon and methane as code owners November 9, 2024 02:00

bedevere-app bot added the awaiting core review label Nov 9, 2024

mpage requested review from colesbury and Yhg1s November 9, 2024 02:00

Fidget-Spinner reviewed Nov 9, 2024

View reviewed changes

colesbury reviewed Nov 11, 2024

View reviewed changes

Objects/dictobject.c Show resolved Hide resolved

Python/specialize.c Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

Python/bytecodes.c Outdated Show resolved Hide resolved

mpage added 5 commits November 16, 2024 10:36

Use atomics for setting keys version for split keys

ad9a9d0

Ensure dict is shared when assigning keys version

5b7e65c

Don't check keys after loading them

129300b

Keys are freed using QSBR. Nothing occurs in between loading the keys and loading the value that would cause the QSBR version to advance, ensuring that the keys object cannot be freed and reused.

Support deferred refcounting

3440429

Merge branch 'main' into pythongh-115999-tlbc-load-global

01041a4

mpage requested a review from colesbury November 19, 2024 01:10

colesbury reviewed Nov 19, 2024

View reviewed changes

Objects/dictobject.c Outdated Show resolved Hide resolved

Objects/dictobject.c Outdated Show resolved Hide resolved

Include/internal/pycore_dict.h Outdated Show resolved Hide resolved

mpage added 3 commits November 18, 2024 21:19

Fix param name

bfe1c93

Always mark keys shared

d5258e0

Update comment

852be38

mpage requested a review from colesbury November 19, 2024 06:24

colesbury approved these changes Nov 19, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 19, 2024

mpage added 2 commits November 20, 2024 15:00

Merge branch 'main' into pythongh-115999-tlbc-load-global

e11e0ae

Don't pass reason to unspecialize

01f4143

Merge branch 'main' into pythongh-115999-tlbc-load-global

b922dfe

mpage merged commit 09c240f into python:main Nov 21, 2024
65 checks passed

bedevere-app bot removed the awaiting merge label Nov 21, 2024

mpage deleted the gh-115999-tlbc-load-global branch November 21, 2024 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds #126607

gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds #126607

Uh oh!

mpage commented Nov 9, 2024 •

edited

Loading

Uh oh!

Fidget-Spinner left a comment

Uh oh!

colesbury left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpage commented Nov 21, 2024

Uh oh!

bedevere-bot commented Nov 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gh-115999: Specialize LOAD_GLOBAL in free-threaded builds #126607

gh-115999: Specialize LOAD_GLOBAL in free-threaded builds #126607

Uh oh!

Conversation

mpage commented Nov 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpage commented Nov 21, 2024

Uh oh!

bedevere-bot commented Nov 21, 2024

Uh oh!

Uh oh!

Uh oh!

gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds #126607

gh-115999: Specialize `LOAD_GLOBAL` in free-threaded builds #126607

mpage commented Nov 9, 2024 •

edited

Loading