Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: GPU Hang on decoding stream #1818

Open
akhilxavi opened this issue Jun 19, 2024 · 4 comments
Open

[Bug]: GPU Hang on decoding stream #1818

akhilxavi opened this issue Jun 19, 2024 · 4 comments
Assignees
Labels
Need Info Need more information from submitter

Comments

@akhilxavi
Copy link

akhilxavi commented Jun 19, 2024

Which component impacted?

Decode

Is it regression? Good in old configuration?

None

What happened?

In my Alder Lake N platform running Ubuntu Noble with 6.8.0.35-generic kernel the GPU Hang on HEVC decode of the rtsp stream.
My vainfo is

libva info: VA-API version 1.22.0
libva info: User environment variable requested driver 'iHD'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_22
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.22 (libva 2.22.0.pre1)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.2.3 (90a02f6)

Stream decode command:
ffmpeg -hide_banner -c:v hevc_qsv -probesize 32 -analyzeduration 0 -fflags nobuffer -fflags discardcorrupt -avioflags direct -flags low_delay -async_depth 1 -r 60 -load_plugin hevc_hw -i rtsp://localhost:8554/live.stream -f null -

encode command:
ffmpeg -hide_banner -pixel_format nv12 -video_size 3840x2160 -r 60 -stream_loop 10000 -i single-frame-nv12-4k.yuv -an -c:v hevc_qsv -async_depth 1 -profile:v main -tier 0 -int_ref_type horizontal -int_ref_cycle_size 6 -int_ref_qp_delta 0 -int_ref_cycle_dist 120 -bf 0 -low_power 1 -skip_frame 2 -low_delay_brc 1 -preset veryfast -b:v 200M -g 120 -slices 12 -f rtsp -rtsp_transport tcp rtsp://localhost:8554/live.stream

If I use same decode command for decoding a local file, I am not seeing the issue:

Local file decode:
ffmpeg -hide_banner -c:v hevc_qsv -probesize 32 -analyzeduration 0 -fflags nobuffer -fflags discardcorrupt -avioflags direct -flags low_delay -async_depth 1 -r 60 -load_plugin hevc_hw -i nv12-5000.h265 -f null -

gpu-hang.txt

If I set the GOP to a higher value the hang takes some time to happen.

What's the usage scenario when you are seeing the problem?

Transcode for media delivery

What impacted?

No response

Debug Information

  1. What's libva/libva-utils/gmmlib/media-driver version?
    VA-API version 1.22.0
    libva 2.22.0.pre1
    Intel iHD driver for Intel(R) Gen Graphics - 24.2.3
    intel-gmmlib-22.3.19

  2. Could you confirm whether GPU hardware exist or not by ls /dev/dri?
    ls /dev/dri/
    by-path card0 renderD128

  3. Could you provide the GPU hardware infromation by lspci -nn |grep -Ei 'VGA|DISPLAY'?
    00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-N [UHD Graphics] [8086:46d1]

4, 5, 6
gpu-hang.txt
hang-dmesg.txt
hang-vainfo.txt

No libva trace logs printed
"vpl-inspect" is working

Do you want to contribute a patch to fix the issue?

None

@akhilxavi
Copy link
Author

akhilxavi commented Jun 19, 2024

I am able to reproduce the issue with below version also.
Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.5 (8068c2e)

libvatrace:
libva_trace.zip

vpl-inspect:
hang-vpl.txt

@akhilxavi
Copy link
Author

Further Update: mediamtx is used as the RTSP server. while using TCP as the transport protocol, the Hang is not happening. The Hang happens only when UDP used as transport protocol.

@Jexu
Copy link
Contributor

Jexu commented Jun 26, 2024

Pls get the i915_error_state log under /sys/class/drm/card0/error after hang happen

@Jexu Jexu added the Need Info Need more information from submitter label Jul 17, 2024
@vl-80
Copy link

vl-80 commented Aug 15, 2024

I also experience GPU hangs under similar circumstances when playing a UDP RTSP stream (H265) with mpv. The error never happens when using TCP.

OS:
Linux Ubuntu LTS 22.04

Linux Kernel:
5.15.149

ls /dev/dri/
by-path card0 renderD128

lspci -nn:
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:4555] (rev 01) (prog-if 00 [VGA controller])

vainfo:

libva info: VA-API version 1.14.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_14
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.14 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 22.3.1 ()
vainfo: Supported profile and entrypoints
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileNone                   : VAEntrypointStats
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSliceLP
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSliceLP
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile1            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileVP9Profile3            : VAEntrypointVLD
      VAProfileHEVCMain422_10         : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointVLD
      VAProfileHEVCMain444            : VAEntrypointEncSliceLP
      VAProfileHEVCMain444_10         : VAEntrypointVLD
      VAProfileHEVCMain444_10         : VAEntrypointEncSliceLP

dmesg:

Aug 15 09:41:50 test-comp-2 kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
Aug 15 09:41:50 test-comp-2 kernel: i915 0000:00:02.0: [drm] mpv[705] context reset due to GPU hang
Aug 15 09:41:50 test-comp-2 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:a8fffffd, in mpv [705]
Aug 15 09:42:00 test-comp-2 kernel: i915 0000:00:02.0: [drm] Resetting vcs0 for preemption time out
Aug 15 09:42:00 test-comp-2 kernel: i915 0000:00:02.0: [drm] mpv[705] context reset due to GPU hang
Aug 15 09:42:00 test-comp-2 kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 11:4:a8fffffd, in mpv [705]

/sys/class/drm/card0/error:
i915_error_state.txt

MPV log:
MPV was run with -v -v -v, the log was redacted to remove a part in the middle when there is no problem. The problem occurs around the line 22808 ([ffmpeg/demuxer] rtsp: max delay reached. need to consume packet).

Around the line 23534 ([vd] Falling back to software decoding.) playback resumes with software decoding
mpv.log.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Need Info Need more information from submitter
Projects
None yet
Development

No branches or pull requests

4 participants