Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I915-23.10.54 GPU hangs on Ubuntu 22.04/Kernel 6.5 with Multi-ARC770 #193

Open
qiyuangong opened this issue Sep 9, 2024 · 4 comments
Open

Comments

@qiyuangong
Copy link

OS Ubuntu 22.04
Kernel 6.5.0-35-generic

Install version

[    4.226457] Loading modules backported from I915-23.10.54
[    4.226463] Backport generated by backports.git I915_23.10.54_PSB_231129.55

Error message

Sep  8 21:20:10 ws-arc-002 systemd[1]: Started libcontainer container 7c6b96e3f0d651ca2147dffae00d2b6ad336456cf0a47794c8d8fc8c8a389509.
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945436] BUG: kernel NULL pointer dereference, address: 00000000000000c8
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945460] #PF: supervisor read access in kernel mode
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945468] #PF: error_code(0x0000) - not-present page
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945474] PGD 339269067 P4D 33926a067 PUD 33926b067 PMD 0
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945483] Oops: 0000 [#1] PREEMPT SMP NOPTI
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945491] CPU: 28 PID: 38003 Comm: python Tainted: G           OE      6.5.0-35-generic #35~22.04.1-Ubuntu
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945502] Hardware name: Supermicro Super Server/X13SWA-TF, BIOS 2.1b 05/28/2024
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945509] RIP: 0010:lru_gen_eviction+0x10f/0x1d0
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945524] Code: d2 48 09 c2 48 83 fe 04 0f 87 a5 00 00 00 45 0f b6 e4 4b 8d 84 a0 95 00 00 00 f0 4d 01 bc c5 88 00 00 00 85 db 0f 95 c0 66 90 <0f> b7 89 c8 00 00 00 48 c1 e2 10 0f b6 c0 48 be 00 00 ff ff ff ff
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945540] RSP: 0000:ff5610a90fe7f5d0 EFLAGS: 00010046
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945547] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945554] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945561] RBP: ff5610a90fe7f610 R08: 0000000000000000 R09: 0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945568] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945575] R13: ff263a68c0152000 R14: ff263aa83ffd4000 R15: 0000000000000001
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945582] FS:  000078fa497f8640(0000) GS:ff263aa740100000(0000) knlGS:0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945590] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945596] CR2: 00000000000000c8 CR3: 0000002abee86002 CR4: 0000000000771ee0
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945603] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  8 21:46:02 ws-arc-002 kernel: [ 2744.945610] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
@smuqthya
Copy link

@qiyuangong Request you to share outputs "dmesg -r " , "dkms status" .
This issue was observed while executing on i-gpu but not on d-gpu. can you please confirm

@qiyuangong
Copy link
Author

dkms status

AUXILIARY_BUS is enabled for 6.5.0-35-generic.
intel-i915-dkms/1.23.10.54.231129.55, 6.5.0-35-generic, x86_64: installedAUXILIARY_BUS is enabled for 6.5.0-35-generic.

dmesg -r
dmesg.log

@smuqthya
Copy link

@qiyuangong Can i know what is the usecase you are looking for

Note:
https://github.com/intel-gpu/intel-gpu-i915-backports?tab=readme-ov-file#intel-graphics-driver-backports-for-linux-os-intel-gpu-i915-backports

For Alchemist discrete Graphics cards, support is provided without display. This repo can be used for the features like GPU debug functionality. For normal cases, please use upstream 6.2 or later kernel version.

@qiyuangong
Copy link
Author

@qiyuangong Can i know what is the usecase you are looking for

Note: https://github.com/intel-gpu/intel-gpu-i915-backports?tab=readme-ov-file#intel-graphics-driver-backports-for-linux-os-intel-gpu-i915-backports

For Alchemist discrete Graphics cards, support is provided without display. This repo can be used for the features like GPU debug functionality. For normal cases, please use upstream 6.2 or later kernel version.

We use this driver for LLM-related debugging and benchmarking. The performance of OOT driver is better than 6.2-6.5 upstream driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants