-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Windows Server Unexpectedly Shuts Down When Using Nvitop to Monitor GPU Usage #136
Open
3 tasks done
Labels
bug
Something isn't working
Comments
@NI-MingCheng thanks for the report, I wonder if the R515 driver can work with CUDA 11.7 on Windows. I found the latest production driver for WinServer 2022 for RTX A5000 is the R550 driver NVIDIA RTX Server Driver Release 550 R550 U10 (553.24) | Windows Server 2022. It would be helpful if you could run the following Python code in a REPL (e.g. >>> from nvitop import CudaDevice
>>> cuda0 = CudaDevice(0)
>>> print(cuda0.as_snapshot())
>>> cuda1 = CudaDevice(1)
>>> print(cuda1.as_snapshot())
>>> cuda2 = CudaDevice(2)
>>> print(cuda2.as_snapshot())
>>> cuda3 = CudaDevice(3)
>>> print(cuda3.as_snapshot()) |
The results are as follows.
Xuehai Pan ***@***.***> 于2024年10月23日周三 16:14写道:
… @NI-MingCheng <https://github.com/NI-MingCheng> thanks for the report, I
wonder if the R515 driver can work with CUDA 11.7 on Windows. I found the
latest production driver for WinServer 2022 for RTX A5000 is the R550
driver NVIDIA RTX Server Driver Release 550 R550 U10 (553.24) | Windows
Server 2022 <https://www.nvidia.com/en-us/drivers/details/233143>.
It would be helpful if you could run the following Python code in a REPL
(e.g. ipython or just type python in the terminal) manually:
>>> from nvitop import CudaDevice
>>> cuda0 = CudaDevice(0)>>> print(cuda0.as_snapshot())
>>> cuda1 = CudaDevice(1)>>> print(cuda0.as_snapshot())
>>> cuda2 = CudaDevice(2)>>> print(cuda0.as_snapshot())
>>> cuda3 = CudaDevice(3)>>> print(cuda0.as_snapshot())
—
Reply to this email directly, view it on GitHub
<#136 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AS4UJTMQWZ3Z3NGUVR33H63Z45LH5AVCNFSM6AAAAABQOFDW2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZRGI2TENBVHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
倪明成 ***@***.***> 于2024年10月23日周三 17:23写道:
… The results are as follows.
Xuehai Pan ***@***.***> 于2024年10月23日周三 16:14写道:
> @NI-MingCheng <https://github.com/NI-MingCheng> thanks for the report, I
> wonder if the R515 driver can work with CUDA 11.7 on Windows. I found the
> latest production driver for WinServer 2022 for RTX A5000 is the R550
> driver NVIDIA RTX Server Driver Release 550 R550 U10 (553.24) | Windows
> Server 2022 <https://www.nvidia.com/en-us/drivers/details/233143>.
>
> It would be helpful if you could run the following Python code in a REPL
> (e.g. ipython or just type python in the terminal) manually:
>
> >>> from nvitop import CudaDevice
> >>> cuda0 = CudaDevice(0)>>> print(cuda0.as_snapshot())
> >>> cuda1 = CudaDevice(1)>>> print(cuda0.as_snapshot())
> >>> cuda2 = CudaDevice(2)>>> print(cuda0.as_snapshot())
> >>> cuda3 = CudaDevice(3)>>> print(cuda0.as_snapshot())
>
> —
> Reply to this email directly, view it on GitHub
> <#136 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AS4UJTMQWZ3Z3NGUVR33H63Z45LH5AVCNFSM6AAAAABQOFDW2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZRGI2TENBVHE>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Required prerequisites
What version of nvitop are you using?
1.3.2
Operating system and version
Windows Server 2022 Datacenter
NVIDIA driver version
516.01
NVIDIA-SMI
Python environment
python -m pip freeze
Problem description
When monitoring GPU usage with nvitop on Windows Server systems, the system experiences unexpected shutdowns. This issue appears to be caused by compatibility conflicts between nvitop and Windows Server's hardware monitoring system.
Steps to Reproduce
Deep learning training using GPU first
Then use Nvitop to view GPU usage
Unexpected system shutdown
Traceback
Logs
Expected behavior
None
Additional context
None
The text was updated successfully, but these errors were encountered: