Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NVIDIA Tesla P40 Card Issues #161

Open
nphil opened this issue Aug 21, 2024 · 13 comments
Open

[Bug]: NVIDIA Tesla P40 Card Issues #161

nphil opened this issue Aug 21, 2024 · 13 comments
Labels
status:awaiting-triage type:bug Something isn't working

Comments

@nphil
Copy link

nphil commented Aug 21, 2024

Describe the Bug

Apparently there's a problem with the newer Nvidia drivers and this Pascal based workstation card. They're available pretty cheap on Ebay (100-150USD) and are excellent for AI tasks with 24GB of VRAM so I'm assuming these will be fairly popular for homelabbers.

These cards do not have any kind of display out capability. Regardless, I was able to run games just fine until recently - Steam goes haywire and Sunshine doesn't work at all (no display detected). See below screenshot for Webui:

image

Mikec92117 on discord advised that the last working driver version was 535.183.01, however I do not see an option to download this on unRAID with the Nvidia plugin.

Is it possible to manually select this driver through steam-headless or ideally, what could be done to fix the issue on the latest NVIDIA drivers?

PS: The latest driver works fine for compute tasks, I use Ollama, Plex transcoding etc. and everything works.

Screenshots

image

Version

steam-headless:latest

Platform

Slackware - 15.0
6.1.99-Unraid
NVIDIA-SMI 550.107.02

Relevant log output

No response

@nphil nphil added status:awaiting-triage type:bug Something isn't working labels Aug 21, 2024
@karl0ss
Copy link

karl0ss commented Aug 25, 2024

I see the same on my Nvidia tesla m60

Can see in nvidia-smi that steam is using G type and appears to be using the card, but blank windows as you see

@nphil
Copy link
Author

nphil commented Aug 25, 2024

I went back to driver 470.x via unRAID and everything works fine. Would rather figure out the issue though as I really want to keep the drivers updated.

@karl0ss
Copy link

karl0ss commented Aug 25, 2024

so you had to pull the drivers back on your actual host? thats a bit of a shame...hmmm

@nphil
Copy link
Author

nphil commented Aug 25, 2024

yup, but i was wondering if there was a way to make the docker image pull the older version of the driver during first run - should technically able to modify the script but it's beyond my ability.

@karl0ss
Copy link

karl0ss commented Aug 26, 2024

I can confirm that rolling my host back to 535.183.01 has fixed this for my Tesla M60 :)

@aa889788
Copy link

could you try Forza Horizon 4 or RDR2 in DX12 mode? My Tesla P4 failed to launch these games in DX12 mode. Is this a common issue of Pascal card?

@karl0ss
Copy link

karl0ss commented Aug 28, 2024

@aa889788 how can we play Forza? can we install the xbox for windows app or something?

@aa889788
Copy link

@aa889788 how can we play Forza? can we install the xbox for windows app or something?

The steam version, but I've just fixed it by delete the TargetHardwareProfiler.dll

@eivanov-c
Copy link

eivanov-c commented Aug 31, 2024

Hi.
I have the same error.
image
Big picture mode doen't work too.
Host is arch linux and in host nvidia-smi see the processes
image
Could you help to understand what's wrong?

@karl0ss
Copy link

karl0ss commented Aug 31, 2024

Not sure, but that is what I saw on the incompatible driver, I know for sure 535.183.01 worked, as that is what I downgraded to and it all started wroking...

things like sunshine should show as G+C for the type when its working...

@eivanov-c
Copy link

I have tried several versions of drivers
nvidia
nvidia-dkms
nvidia-470
nvidia-535.183.01
but each time I have the same error

@hannemann
Copy link

My nvidia card stopped rendering the video for remote play. This seems to be related to a change in the nvidia driver. Maybe my workaround helps you with your problem. The issus seems to be related to the removal of older video rendering presets from the driver.

@see: ValveSoftware/steam-for-linux#10984

I worked around the issue by letting my Intel iGPU do the rendering via vaapi. I achieved this by installing the 32 Bit version of the i965 driver utilizing a script in $CONTAINER_HOME/init.d

#!/usr/bin/env sh

sudo apt install -y i965-va-driver:i386

I than created symlinks to the intel card and renderD devices in the /dev/dri/by-path directory on the host since the /dev/dri/(card|render) device nodes can change their names after reboot.

sudo ln -s /dev/dri/by-path/pci-0000\:00\:02.0-card /path/to/steam-headless/home/dev_dri_intel_card
sudo ln -s /dev/dri/by-path/pci-0000\:00\:02.0-render $/path/to/steam-headless/home/dev_dri_intel_render

In the next step i added the links to my docker-compose.yml in the devices section. This has to be done since docker did not accept colons within filenames.

      - /path/to/steam-headless/dev_dri_intel_card:/dev/dri/card1
      - /path/to/daten/steam-headless/dev_dri_intel_render:/dev/dri/renderD128

One could improve this by creating udev rules that match these devices.

Finally i disabled nvidia rendering in the remote play settings of steam. One can verify if vaapi is working by tailing the $CONTAINER_HOME/.steam/debian-installation/logs/streaming_log.txt

@nphil
Copy link
Author

nphil commented Nov 1, 2024

For those using the official unraid container from Community apps:

If anyone else is trying to set an old driver version manually via an environment variable, you must remove the --runtime='nvidia' from your extra parameters and replace with --device='/dev/dri' instead. You will then need to add an extra variable NVIDIA_DRIVER_VERSION and give it a value of 535.183.01.

Not quite done yet, as I've hit another issue: nvidia-smi shows up a "failed to initialize NVML: unknown error".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:awaiting-triage type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants