-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Specialize LOAD_GLOBAL
in free-threaded builds
#126607
base: main
Are you sure you want to change the base?
Conversation
Thread-safety of specialization in free-threaded builds: - A critical section is held on both the globals and builtins objects during specialization. This ensures we get an atomic view of both builtins and globals during specialization. - Generation of new keys versions is made atomic in free-threaded builds. - We use helpers safely modify the bytecode. Thread-safety of specialized instructions in free-threaded builds: - Dicts keys objects are passed from keys version guards to the downstream uops. This ensures that we are loading from the correct offset in the keys object. Once a unicode key has been stored in a keys object for a combined dictionary in free-threaded builds, the offset that it is stored in will never be reaused for a different key. - The dictionary read fast-path is used to read values from the dictionary. - Relaxed atomics are used when loading and storing dict keys versions. This avoids potential data races as the dict keys versions may be read without holding the dictionary's per-object lock while the instructions are executing.
This handles the case where another thread is instrumenting the bytecode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was less complicated than I initially anticipated. Nice!
LGTM overall, but I'll withold from approval to let others review it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! A few comments below.
@@ -4498,7 +4498,7 @@ dict_popitem_impl(PyDictObject *self) | |||
return NULL; | |||
} | |||
} | |||
self->ma_keys->dk_version = 0; | |||
FT_ATOMIC_STORE_UINT32_RELAXED(self->ma_keys->dk_version, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one more assignment to dk_version
in insert_split_key
that should use this pattern.
void | ||
_Py_Specialize_LoadGlobal( | ||
static void | ||
specialize_load_global_lock_held( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mark the dicts as "shared" (_PyObject_GC_SET_SHARED
/ SET_DICT_SHARED
) when we specialize for load global. That ensures that the keys object will not be immediately freed when the dictionary is resized, which is important if another thread is trying to read the keys without holding a lock.
The dict code enforces this with ensure_shared_on_read()
, but I think we should have the version check subsume this: if the dict has a valid version, then it's marked as shared (and we can attempt the non-locking read without checking the owning thread).
So maybe ensure the dict is shared in _PyDictKeys_GetVersionForCurrentState
.
if (globals_keys != _Py_atomic_load_ptr_acquire(&dict->ma_keys)) { | ||
Py_DECREF(res_o); | ||
DEOPT_IF(true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these checks that the keys haven't changed are not necessary. We have them in dictobject.c
too, and I think they're not necessary there either.
I think the checks were inherited from the nogil forks, which used a different scheme for dictionary keys.
(Also the use of global_keys
here seems to contradict the DEAD(globals_keys)
above.)
DEAD(globals_keys); | ||
SYNC_SP(); | ||
DEOPT_IF(res_o == NULL); | ||
#if Py_GIL_DISABLED | ||
int increfed = _Py_TryIncrefCompare(&entries[index].me_value, res_o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to support deferred reference counting here. Loading functions to call them will use LOAD_GLOBAL_MODULE
and LOAD_GLOBAL_BUILTINS
.
I don't think we have the equivalent of _Py_TryIncrefCompare
for _PyStackRef
yet, it needs to be written.
Enable specialization of
LOAD_GLOBAL
in free-threaded builds.Thread-safety of specialization in free-threaded builds is provided by the following:
Thread-safety of specialized instructions in free-threaded builds is provided by the following:
Performance-wise, this looks like a 3-4% improvement on free-threaded builds on pyperformance.
--disable-gil
builds #115999