Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: load_unload_reload_test test fails: "Option 'disassemble' registered more than once" #109

Open
AngryLoki opened this issue Dec 21, 2024 · 4 comments

Comments

@AngryLoki
Copy link

AngryLoki commented Dec 21, 2024

Problem Description

Hi,

I'm trying to achieve 100% test rate on roctracer, one test fails.

Participants:

  • roctracer from rocm-6.3.0 release
  • libamd_comgr and libhsa-runtime64 from rocm-6.3.0 release
  • LLVM 19.1.4

This code fails on a second iteration in hsa_init:

for (int i = 0; i < 2; ++i) {
hsa_init();
CHECK(hsa_iterate_agents(
[](hsa_agent_t agent, void*) {
hsa_device_type_t type;
return hsa_agent_get_info(agent, HSA_AGENT_INFO_DEVICE, &type);
},
nullptr));
hsa_shut_down();
}

with error : CommandLine Error: Option 'disassemble' registered more than once! and backtrace:

#0  llvm::report_fatal_error (Reason=0x7ffccf88702b "inconsistency in registered CommandLine options", GenCrashDiag=true)
    at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/lib/Support/ErrorHandling.cpp:83
#1  0x00007ffcd2bc3c9d in (anonymous namespace)::CommandLineParser::addOption (this=<optimized out>, O=<optimized out>, SC=0x555555581fc0)
    at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/lib/Support/CommandLine.cpp:241
#2  0x00007ffcd2bb13d3 in (anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, bool)::{lambda(llvm::cl::SubCommand&)#1}::operator()(llvm::cl::SubCommand&) const (
    SC=..., this=<optimized out>) at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/lib/Support/CommandLine.cpp:249
#3  llvm::function_ref<void (llvm::cl::SubCommand&)>::callback_fn<(anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, bool)::{lambda(llvm::cl::SubCommand&)#1}>(long, llvm::cl::SubCommand&) (params=..., callable=<optimized out>) at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/include/llvm/ADT/STLFunctionalExtras.h:45
#4  llvm::function_ref<void(llvm::cl::SubCommand&)>::operator() (this=<optimized out>, params=...)
    at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/include/llvm/ADT/STLFunctionalExtras.h:68
#5  0x00007ffcd2bb13d3 in (anonymous namespace)::CommandLineParser::forEachSubCommand (this=0x555555581e60, Opt=..., Action=...) from /usr/lib/llvm/19/lib64/libLLVM.so.19.1+libcxx
#6  (anonymous namespace)::CommandLineParser::addOption (this=0x555555581e60, O=0x7ffde25a0530 <(anonymous namespace)::Disassemble>, ProcessDefaultOption=false)
    at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/lib/Support/CommandLine.cpp:249
#7  llvm::cl::Option::addArgument (this=0x7ffde25a0530 <(anonymous namespace)::Disassemble>) at /usr/src/debug/llvm-core/llvm-19.1.6/llvm/lib/Support/CommandLine.cpp:416
#8  0x00007ffde2593cce in llvm::cl::opt<bool, false, llvm::cl::parser<bool> >::done (this=0x7ffde25a0530 <(anonymous namespace)::Disassemble>)
    at /usr/lib/llvm/19/include/llvm/Support/CommandLine.h:1477
#9  llvm::cl::opt<bool, false, llvm::cl::parser<bool> >::opt<char [12], llvm::cl::desc> (this=0x7ffde25a0530 <(anonymous namespace)::Disassemble>, Ms=..., Ms=...) at /usr/lib/llvm/19/include/llvm/Support/CommandLine.h:1501
#10 __cxx_global_var_init () at /usr/src/debug/dev-libs/rocm-comgr-6.3.0/llvm-project-rocm-6.3.0/amd/comgr/src/comgr-objdump.cpp:96
#11 0x00007ffde2593cce in _GLOBAL__sub_I_comgr_objdump.cpp () from /usr/lib64/libamd_comgr.so.2
#12 0x00007ffff7fcd67f in call_init () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff7fcd76c in _dl_init () from /lib64/ld-linux-x86-64.so.2
#14 0x00007ffff7fca5d9 in _dl_catch_exception () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff7fd4592 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#16 0x00007ffff7fca544 in _dl_catch_exception () from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff7fd494e in _dl_open () from /lib64/ld-linux-x86-64.so.2
#18 0x00007ffff7ac58dc in ?? () from /usr/lib64/libc.so.6
#19 0x00007ffff7fca544 in _dl_catch_exception () from /lib64/ld-linux-x86-64.so.2
#20 0x00007ffff7fca674 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#21 0x00007ffff7ac531b in ?? () from /usr/lib64/libc.so.6
#22 0x00007ffff7ac59a9 in dlopen () from /usr/lib64/libc.so.6
#23 0x00007ffff7fb67d5 in (anonymous namespace)::roctracer_plugin_t::roctracer_plugin_t (plugin_path=..., this=<optimized out>)
    at /var/tmp/portage/dev-util/roctracer-6.3.0/work/roctracer-rocm-6.3.0/src/tracer_tool/tracer_tool.cpp:143
#24 std::__1::__construct_at[abi:se190106]<(anonymous namespace)::roctracer_plugin_t, std::__1::__fs::filesystem::path&, (anonymous namespace)::roctracer_plugin_t*>((anonymous namespace)::roctracer_plugin_t*, std::__1::__fs::filesystem::path&) (__args=..., __location=<optimized out>) at /usr/include/c++/v1/__memory/construct_at.h:52
#25 std::__1::__optional_storage_base<(anonymous namespace)::roctracer_plugin_t, false>::__construct[abi:se190106]<std::__1::__fs::filesystem::path&>(std::__1::__fs::filesystem::path&) (__args=..., this=<optimized out>) at /usr/include/c++/v1/optional:363
#26 std::__1::optional<(anonymous namespace)::roctracer_plugin_t>::emplace[abi:se190106]<std::__1::__fs::filesystem::path&, void>(std::__1::__fs::filesystem::path&) (__args=..., 
    this=<optimized out>) at /usr/include/c++/v1/optional:759
#27 OnLoad (table=0x7ffff7eca368 <rocr::core::hsa_api_table()::table>, runtime_version=<optimized out>, failed_tool_count=<optimized out>, failed_tool_names=<optimized out>)
    at /var/tmp/portage/dev-util/roctracer-6.3.0/work/roctracer-rocm-6.3.0/src/tracer_tool/tracer_tool.cpp:699
#28 0x00007ffff7cd3acc in rocr::AMD::callback_t<bool (*)(HsaApiTable*, unsigned long, unsigned long, char const* const*)>::operator() (args=0x0, args=0x0, args=0x0, args=0x0, 
    this=<optimized out>) at /usr/src/debug/dev-libs/rocr-runtime-6.3.0/ROCR-Runtime-rocm-6.3.0/runtime/hsa-runtime/core/inc/exceptions.h:88
#29 rocr::core::Runtime::LoadTools (this=0x55555556d3d0) at /usr/src/debug/dev-libs/rocr-runtime-6.3.0/ROCR-Runtime-rocm-6.3.0/runtime/hsa-runtime/core/runtime/runtime.cpp:2353
#30 0x00007ffff7ccae41 in rocr::core::Runtime::Load (this=0x55555556d3d0)
    at /usr/src/debug/dev-libs/rocr-runtime-6.3.0/ROCR-Runtime-rocm-6.3.0/runtime/hsa-runtime/core/runtime/runtime.cpp:2010
#31 0x00007ffff7ccac67 in rocr::core::Runtime::Acquire () at /usr/src/debug/dev-libs/rocr-runtime-6.3.0/ROCR-Runtime-rocm-6.3.0/runtime/hsa-runtime/core/runtime/runtime.cpp:150
#32 0x00007ffff7caea4a in rocr::HSA::hsa_init () at /usr/src/debug/dev-libs/rocr-runtime-6.3.0/ROCR-Runtime-rocm-6.3.0/runtime/hsa-runtime/core/runtime/hsa.cpp:206
#33 0x0000555555555a26 in main () at /var/tmp/portage/dev-util/roctracer-6.3.0/work/roctracer-rocm-6.3.0/test/hsa/load_unload_reload.cpp:3

So what happens is that comgr-objdump.cpp registers command-line options in global vars, and attempt to reload comgr fails, as these options are never unregistered.

Searching for similar issues, miopen encountered a similar issue long time ago (even without reloading) and as a workaround they disabled second hipInit: ROCm/ROCm-CompilerSupport#30.

The issue does not reproduce when LD_PRELOAD=./libroctracer_tool.so is not used (so that seems to be result of dlopen-ed libraries).

Could you provide a fix for this test or remove it (if it is obsolete)? Thanks!

Operating System

Gentoo

CPU

GPU

ROCm Version

ROCm 6.3.0

ROCm Component

roctracer

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@ppanchad-amd
Copy link

Hi @AngryLoki. Internal ticket has been created to investigate your issue. Thanks!

@tcgu-amd
Copy link
Contributor

tcgu-amd commented Jan 3, 2025

Hi @AngryLoki, sorry for the late reply. We have been investigating this issue. Just to clarify, are you able to the pass the test without LD_PRELOAD=./libroctracer_tool.so? Thanks!

@AngryLoki
Copy link
Author

@tcgu-amd , yes, without LD_PRELOAD the executable works fine. I updated backtrace above with more symbols to show how tracer_tool causes llvm::report_fatal_error via dlopen (and without dlopen everything would work fine).

@tcgu-amd
Copy link
Contributor

tcgu-amd commented Jan 8, 2025

Hi @AngryLoki, seems like we are not able to reproduce the error on Ubuntu. Based on your log, could it be that there might be a build mismatch between your libamd_comgr and libhsa-runtime64 since the roctracer seems to be from portage? Also, I noticed that your are using LLVM 19, but the official ROCm is using LLVM 18, which might further the disparity. Would you provide more context about how you obtained/set up your roctracer, libamd_comgr and libhsa-runtime64? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants