Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] InitCCLPerWorker Fails when using AMD GPU Bridge #317

Closed
TNT3530 opened this issue Apr 15, 2024 · 1 comment
Closed

[Bug] InitCCLPerWorker Fails when using AMD GPU Bridge #317

TNT3530 opened this issue Apr 15, 2024 · 1 comment

Comments

@TNT3530
Copy link

TNT3530 commented Apr 15, 2024

Expected behavior

MLC-LLM should be load the sharded model across all 4 AMD Instinct MI100 GPUs and start inferring.
Issue is confirmed only with the bridge enabled, adding amdgpu.use_xgmi_p2p=0 to grub config makes the issue stop with no other changes, though this reverts back to PCIe P2P only.

Here is the output when attempting to run with NCCL_DEBUG=INFO
screenlog.txt

Actual behavior

/src/extlibs/rccl/build/hipify/src/transport/p2p.cc:287 NCCL WARN Cuda failure 'invalid argument'
terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [02:18:19] /workspace/tvm/src/runtime/disco/nccl/nccl.cc:196: rcclErrror: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
Stack trace:
  0: _ZN3tvm7runtime6deta
  1: tvm::runtime::nccl::InitCCLPerWorker(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  2: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<void (*)(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(void (*)(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  3: tvm::runtime::DiscoWorker::Impl::CallPacked(tvm::runtime::DiscoWorker*, long, tvm::runtime::PackedFunc, tvm::runtime::TVMArgs const&)
  4: tvm::runtime::DiscoWorker::Impl::MainLoop(tvm::runtime::DiscoWorker*)
  5: 0x00007ff61c0dc252
  6: start_thread
        at ./nptl/pthread_create.c:442
  7: 0x00007ff64cd2665f
        at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  8: 0xffffffffffffffff

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ROCm 6.0
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): 4x AMD Instinct MI100
How you installed MLC-LLM (conda, source): conda
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10.12
TVM Unity Hash Tag: unity.txt

Steps to reproduce

Install MLC-LLM
Run the following python code to start loading/inferring

cm = ChatModule(model="goliath-120b-q4f16_1", chat_config=ChatConfig(
	max_gen_len=4096,
	conv_template="LM",
	temperature=0.75,
	repetition_penalty=1.1,
	top_p=0.9,
	tensor_parallel_shards=4,
	context_window_size=4096
))

output = cm.generate(
    prompt="What is the meaning of life?",
    progress_callback=StreamToStdout(callback_interval=2),
)
@TNT3530
Copy link
Author

TNT3530 commented Aug 28, 2024

Note: This has been fixed as of mlc 0.15.dev544

@TNT3530 TNT3530 closed this as completed Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant