Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when integrated with Ray #181

Closed
mobicham opened this issue Sep 13, 2024 · 4 comments
Closed

Segmentation fault when integrated with Ray #181

mobicham opened this issue Sep 13, 2024 · 4 comments

Comments

@mobicham
Copy link

BitBlas throws a segmentation fault when integrated in an environment using Ray, is this something related to the order of loading of bitblas or something? Thank you very much in advance!

*** SIGSEGV received at time=1726233628 on cpu 111 ***
PC: @     0x7f7e3d482e42  (unknown)  (unknown)
    @     0x7f7e3d31e520  (unknown)  (unknown)
[2024-09-13 13:20:28,314 E 120351 121131] logging.cc:440: *** SIGSEGV received at time=1726233628 on cpu 111 ***
[2024-09-13 13:20:28,315 E 120351 121131] logging.cc:440: PC: @     0x7f7e3d482e42  (unknown)  (unknown)
[2024-09-13 13:20:28,315 E 120351 121131] logging.cc:440:     @     0x7f7e3d31e520  (unknown)  (unknown)
Fatal Python error: Segmentation fault

Stack (most recent call first):
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 252 in __init_handle_by_constructor__
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/3rdparty/tvm/python/tvm/_ffi/_ctypes/object.py", line 145 in __init_handle_by_constructor__
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/3rdparty/tvm/python/tvm/runtime/object.py", line 101 in __setstate__
  File "/usr/lib/python3.10/copy.py", line 273 in _reconstruct
  File "/usr/lib/python3.10/copy.py", line 172 in deepcopy
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/ops/operator.py", line 141 in apply_default_schedule
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/ops/operator.py", line 158 in _build_default_module
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/ops/general_matmul/__init__.py", line 257 in dispatch_tir
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/bitblas/ops/general_matmul/__init__.py", line 243 in __init__
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/hqq/backends/bitblas.py", line 109 in __init__
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/hqq/backends/bitblas.py", line 190 in patch_hqq_to_bitblas
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/hqq/models/base.py", line 154 in patch_linearlayers
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/hqq/utils/patching.py", line 25 in patch_linearlayers
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/hqq/utils/patching.py", line 113 in prepare_for_inference
  File "/root/aana_sdk/aana/deployments/hqq_deployment.py", line 141 in apply_config
  File "/root/aana_sdk/aana/deployments/base_deployment.py", line 23 in reconfigure
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 872 in _call_func_or_gen
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 959 in call_reconfigure
  File "/root/.cache/pypoetry/virtualenvs/aana-XDlPP_xZ-py3.10/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 795 in _run_user_code_event_loop
  File "/usr/lib/python3.10/threading.py", line 953 in run
  File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
@LeiWang1999
Copy link
Contributor

Hi @mobicham , thanks for reporting! would you mind provide scripts for us to reproduce?

@mobicham
Copy link
Author

Sorry for the delay @LeiWang1999 . We fixed the issue by importing bitblas first before anything else.
Is there a logic as of why the import order is important for bitblas ?
Thank you!

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Sep 17, 2024

@mobicham That's interesting, I met some problems when working with mlc, for example:

import tvm  # upstream

relax_mod = relax_transform(relax_mod)

import welder
relax_mod = welder.tune(relax_mod)
# something bad happened

The problem was that when welder is imported, it also imports in its own version of TVM, which then invokes load_dlls (for example, to load libcutlass). This process ends up overwriting the upstream cutlass lib and lead to some bugs.

I guess there may be similar rationals behind these two cases.

@mobicham
Copy link
Author

Thanks! Yeah for the moment the trick is to experiment with different import orders and pick the one that doesn't throw an error.
Closing this issue, thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants