-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] [CUDA] installing CUDA version with 'pip' hangs, consume all the memory (32 G) and swap space #6824
Comments
Maybe there are some infinite loop in the build script? Since the full command of |
Thanks for using LightGBM. There is no way that °pip install lightgbm` should be generating "hundreds" or "millions" of processes. Have you confirmed that those process are directly related to this Can you share more details please?
|
I'm pretty sure, it's caused by
Python 3.9.1 (default, Dec 11 2020, 14:32:07)
Ubuntu 22.04.5 LTS \n \l Linux 5.15.0-126-generic #136-Ubuntu SMP Wed Nov 6 10:38:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
pip 25.0
No. |
So you don't see any of those processes immediately after interrupting
I don't think we support that card in LightGBM's default configuration. From https://en.wikipedia.org/wiki/CUDA#GPUs_supported, it looks like that's a Maxwell GPU that requires CUDA Compute Capability 5.2. The oldest compute capability LightGBM's build supports is for Pascal (6.x). Line 226 in d24260f
Could you try adding Tell me what happens and please share all the logs. git clone --recursive https://github.com/microsoft/LightGBM.git
cd ./LightGBM
git fetch origin --tags
git checkout v4.5.0
# (manually modify that line I asked you to modify)
cmake -B build -S . -DUSE_CUDA=ON
cmake --build build --target _lightgbm
sh build-python.sh --precompile |
error out:
while:
Is there a way to set '-allow-unsupported-compiler' on the command line? |
You can add flags to the environment variable Line 222 in d24260f
If it's possible, it would be better to downgrade to an older |
I removed the system default ninja:
Now, the first problem ninja hang solved. I'm still struggling with gcc errors: like this one: #5089 Mine: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 I'm wondering what is the best GCC version to compile lightgbm? |
I'm sorry, but I'm really struggling to understand this report. You said you were compiling with But I'm glad to hear that you're no longer seeing Ninja issues (even if I still don't understand what the original problem was that you said was causing "millions" of processes to be spawned). please, can you share the full logs as I asked above? Exactly like this report did: #5089 (comment) |
Sorry for the confusion. I'm trying two things:
after I remove ninja-build 1.10.1-1, pip no longer hangs. But I got GCC error of #5089
I updated CMakeLists.txt:
However, that flag seems not picked up, I'm still seeing this error:
|
Also, is there a way to pass '-allow-unsupported-compiler' flag to the pip command:
? |
Try
But if we're going to continue with this, please... when I ask for something, provide it or explain why you can't. I've asked twice now for the "full" logs. Those contain lots of useful information that would help us make more debugging progress. Please run these commands on your checkout of LightGBM (with the changes we've discussed above, adding the Maxwell compute capabilities). cmake -B build -S . -DUSE_CUDA=ON
cmake --build build --target _lightgbm
sh build-python.sh --precompile And share ALL of the logs that that produces (not only error messages), like in #5089 (comment). |
I tried both double
Still the same error message, I'm just an ordinary ML user (I need to keep my clang version for other tasks), and I found BTW, I found the pre-built package size is really small:
I'm just wondering if you can just provide a Thanks for all the help. |
That's a good idea, and yes we may try to support that in the future. I've opened #6828 to document that, you might want to click "subscribe" there to be notified about discussions there. Even if we did, it wouldn't have helped you in this case. It's unlikely we'd add support for Maxwell GPUs to any pre-built wheels that we distribute. So even if such a package existed, you still would have probably had to build from source.
Sorry that is still not working for you, I must be getting the syntax wrong. I don't have access to a Maxwell or similar GPU to test. There may be warnings in CMake's logs that help us with that, but since you're not sharing the logs I don't know.
Ok, we'll close this. If you come back to open issues here in the future, we're happy to help.. but when the people helping you ask for things, please don't ignore those requests. |
Hangs, consume all the memory (32 G) and swap space
No other error message.
top
shows there are hundreds of<my Python virtual environment ...>/ninja --version
command being executed.Have to kill the install.
The text was updated successfully, but these errors were encountered: