Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: race condition in init with gil_scoped_release causes pybind11_object_dealloc(): Tried to deallocate unregistered instance! #5473

Open
3 tasks done
d4l3k opened this issue Dec 19, 2024 · 1 comment
Labels
triage New bug, unverified

Comments

@d4l3k
Copy link

d4l3k commented Dec 19, 2024

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

2.13.6, 2.12.1, 741d86f

Problem description

Hi, I recently found an issue when stress testing pybind related code in PyTorch but have been able to repro it with just pure pybind11 code (d4l3k@4988742).

The issue seems to be that when using gil_scoped_release in an init() function with multiple threads it causes memory corruption in the object registration and thus crashes when deleting the pybinded object.

terminate called after throwing an instance of 'std::runtime_error'
  what():  pybind11_object_dealloc(): Tried to deallocate unregistered instance!

I suspect what's happening is that the gil_scoped_release is causing the object registration to not be correctly protected resulting in memory corruption. When we then delete it, the entry has been overridden and thus can't be found.

C++ stack trace (originally discovered in PyTorch)

#0  0x00007ffff7c8bacc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7c3e686 in raise () from /lib64/libc.so.6
#2  0x00007ffff7c28833 in abort () from /lib64/libc.so.6
#3  0x00007ffff4cd4f00 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#4  0x00007ffff4cd343c in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffff4cd348e in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#6  0x00007ffff4cd3680 in __cxxabiv1::__cxa_throw (obj=0x7fdd2805b700, tinfo=0x7ffff4df0560 <typeinfo for std::runtime_error>, dest=0x7ffff4ce0272 <std::runtime_error::~runtime_error()>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:98
#7  0x00007fffeec78c82 in pybind11::pybind11_fail(char const*) () from /home/tristanr/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#8  0x00007fffeeec93f7 in pybind11::detail::clear_instance(_object*) () from /home/tristanr/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#9  0x00007fffeeec9c71 in pybind11_object_dealloc () from /home/tristanr/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#10 0x00000000004f10a7 in _Py_DECREF (op=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.10.13/Include/object.h:500

python stack trace (gdb py-bt)

  File "/home/tristanr/pybind11/tests/test_init_race.py", line 17, in run
    del store
  File "/home/tristanr/.conda/envs/pybind-3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/tristanr/.conda/envs/pybind-3.10/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/home/tristanr/.conda/envs/pybind-3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/tristanr/.conda/envs/pybind-3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/tristanr/.conda/envs/pybind-3.10/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()

Reproducible example code

full repro: d4l3k@4988742

invoked with:

PYTHONPATH=./build/tests/ python tests/test_init_race.py
PYTHONPATH=./build/tests/ gdb -q -ex=r --args (which python) tests/test_init_race.py
// C++
TEST_SUBMODULE(init_race, m) {
    class Simple {};
    py::class_<Simple, std::unique_ptr<Simple>>(m, "Simple")
        .def(py::init([]() {
                 std::this_thread::sleep_for(std::chrono::milliseconds(50));
                 return std::make_unique<Simple>();
             }),
             py::call_guard<py::gil_scoped_release>());
}

Python

import time
from concurrent.futures import ThreadPoolExecutor
import env  # noqa: F401
import pytest
from pybind11_tests import init_race as m

def run():
    store = m.Simple()
    # this sleep is required to trigger the crash
    time.sleep(0.1)
    del store

futures = []
with ThreadPoolExecutor(
    max_workers=100,
) as executor:
    for i in range(100000):
        print(i)
        futures.append(executor.submit(run))
        if len(futures) > 100:
            futures.pop(0).result()

Using Python 3.10 installed via conda.

Is this a regression? Put the last known working version here if it is.

Not a regression

@d4l3k
Copy link
Author

d4l3k commented Dec 19, 2024

Moving the gil_scoped_release inside the init function seems to fix the issue

    py::class_<Simple, std::unique_ptr<Simple>>(m, "Simple").def(py::init([]() {
        py::gil_scoped_release no_gil{};
        std::this_thread::sleep_for(std::chrono::milliseconds(50));
        return std::make_unique<Simple>();
    }));

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this issue Dec 20, 2024
`py::call_guard<py::gil_scoped_release>` is not safe when using multiple threads. This instead moves it into the init function which is safe.

For more details see #143593

pybind/pybind11#5473

Test plan:

```
python setup.py develop
```

CI

```py
import time
from concurrent.futures import ThreadPoolExecutor
from torch import distributed as dist

def run():
    store = dist.TCPStore(
        host_name="localhost",
        port=0,
        is_master=True,
        wait_for_workers=False,
    )

    # this sleep is required to trigger the crash
    time.sleep(0.1)
    del store

futures = []
with ThreadPoolExecutor(
    max_workers=100,
) as executor:
    for i in range(100000):
        print(i)
        futures.append(executor.submit(run))
        if len(futures) > 100:
            futures.pop(0).result()
```

Pull Request resolved: #143598
Approved by: https://github.com/c-p-i-o
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

No branches or pull requests

1 participant