Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With some drivers, grim stalls on the primary output when wl-mirror is active #43

Open
zboszor opened this issue Jul 22, 2024 · 11 comments
Labels
bug Something isn't working upstream-bug

Comments

@zboszor
Copy link

zboszor commented Jul 22, 2024

Hi,

I observed that with device taking a screenshot with grim on the first output stalls but finishes quickly on the second output:

# lspci -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0e)
# lspci -n -s 00:02.0
00:02.0 0300: 8086:0f31 (rev 0e)

The machine in question is a Flytech POS335 with a built-in 1024x768 monitor. The external monitor is 1600x900. Outputs: eDP-1 and VGA-1.

However, on a different hardware (Flytech POS457) grim finishes quickly on both outputs (eDP-1 and DP-1) using the same software environment:

# lspci -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Elkhart Lake [UHD Graphics Gen11 16EU] (rev 01)
# lspci -n -s 00:02.0
00:02.0 0300: 8086:4555 (rev 01)

Both devices are driven by the i915 kernel driver. Kernel version is 6.9.10. Mesa is 24.0.7. wl-mirror is the latest 0.16.5.

What can cause this difference in behaviour?

@Ferdi265
Copy link
Owner

Hi!

Thanks for the report!

This looks at first glance like either a bug in grim or a bug in the compositor (I assume you are using Sway?). Grim is using the wlr-screencopy protocol, and wl-mirror is either using wlr-export-dmabuf or wlr-screencopy. AFAIK there shouldn't be a problem with multiple programs using these protocols simultaneously, but I didn't test that extensively.

Can you try running wl-mirror with -b dmabuf or -b screencopy explicitly and see if it changes the behaviour? You can also add extra logging with --verbose. Grim doesn't appear to have debug logging, although it would be interesting to see where exactly it hangs.

@Ferdi265 Ferdi265 added bug Something isn't working upstream-bug labels Jul 22, 2024
@zboszor
Copy link
Author

zboszor commented Jul 22, 2024

Indeed, I am using Sway. Version 1.9 to be exact. I will try to add logging. Thanks.

@zboszor
Copy link
Author

zboszor commented Jul 22, 2024

Besides getting a lot of this below, there is nothing that tells me something.

[ERROR] [wlr] [/usr/src/debug/wlroots/0.17.4-r0/backend/drm/atomic.c:73] connector VGA-1: Atomic commit failed: Device or resource busy

Occasionally

2024-07-22 09:28:01 - [/usr/src/debug/swaybg/1.2.0-r0/main.c:582] wl_display_roundtrip failed

On this particular machine, taking a screenshot stalls with both wl-mirror -b dmabuf and wl-mirror -b screencopy.

(More testing...) But it's inconclusive. Sometimes a couple (sometimes a little more than 30) screenshots after restarting Sway succeeds just fine, quite quickly. Then it starts stalling. After killing grim with Ctrl-C, subsequent runs also stall.

swaymsg doesn't show grim as a "stuck" client.

$ swaymsg -t get_tree
#1: root "root"
  #2147483647: output "__i3"
    #2147483646: workspace "__i3_scratch"
  #3: output "eDP-1"
    #4: workspace "1"
      #8: con "SICOM Chef - Chromium" (xdg_shell, pid: 28000, app_id: "chromium-browser (/home/sicom/.config/chromium)")
  #5: output "VGA-1"
    #6: workspace "2"
      #7: con "Wayland Output Mirror for eDP-1" (xdg_shell, pid: 28032, app_id: "at.yrlf.wl_mirror")

Without screen mirroring, screenshots do work quickly and for over 100 attempts at a time.
As I wrote above, usually (much) less than 30 attempts will trigger the problem.

@Ferdi265
Copy link
Owner

Besides getting a lot of this below, there is nothing that tells me something.

[ERROR] [wlr] [/usr/src/debug/wlroots/0.17.4-r0/backend/drm/atomic.c:73] connector VGA-1: Atomic commit failed: Device or resource busy

Occasionally

2024-07-22 09:28:01 - [/usr/src/debug/swaybg/1.2.0-r0/main.c:582] wl_display_roundtrip failed

On this particular machine, taking a screenshot stalls with both wl-mirror -b dmabuf and wl-mirror -b screencopy.

(More testing...) But it's inconclusive. Sometimes a couple (sometimes a little more than 30) screenshots after restarting Sway succeeds just fine, quite quickly. Then it starts stalling. After killing grim with Ctrl-C, subsequent runs also stall.

swaymsg doesn't show grim as a "stuck" client.

$ swaymsg -t get_tree
#1: root "root"
  #2147483647: output "__i3"
    #2147483646: workspace "__i3_scratch"
  #3: output "eDP-1"
    #4: workspace "1"
      #8: con "SICOM Chef - Chromium" (xdg_shell, pid: 28000, app_id: "chromium-browser (/home/sicom/.config/chromium)")
  #5: output "VGA-1"
    #6: workspace "2"
      #7: con "Wayland Output Mirror for eDP-1" (xdg_shell, pid: 28032, app_id: "at.yrlf.wl_mirror")

Without screen mirroring, screenshots do work quickly and for over 100 attempts at a time. As I wrote above, usually (much) less than 30 attempts will trigger the problem.

Interesting... I will try and see if I can reproduce this on one of my machines in the next few days.

Can you try running grim with WAYLAND_DEBUG=1? This should tell us which wayland events were delivered, which requests were sent, and thus where grim is stalled.

@zboszor
Copy link
Author

zboszor commented Jul 23, 2024

Here it is:

$ grim -o HDMI-A-1 -t png capture-0.png
[3583248.091]  -> [email protected]_registry(new id wl_registry@2)
[3583248.198]  -> [email protected](new id wl_callback@3)
[3583248.682] [email protected]_id(3)
[3583248.720] [email protected](1, "wl_shm", 1)
[3583248.738]  -> [email protected](1, "wl_shm", 1, new id [unknown]@4)
[3583248.748] [email protected](2, "wl_drm", 2)
[3583248.755] [email protected](3, "zwp_linux_dmabuf_v1", 4)
[3583248.762] [email protected](4, "wl_compositor", 6)
[3583248.779] [email protected](5, "wl_subcompositor", 1)
[3583248.795] [email protected](6, "wl_data_device_manager", 3)
[3583248.803] [email protected](7, "zwlr_gamma_control_manager_v1", 1)
[3583248.809] [email protected](8, "zxdg_output_manager_v1", 3)
[3583248.817]  -> [email protected](8, "zxdg_output_manager_v1", 2, new id [unknown]@5)
[3583248.824] [email protected](9, "ext_idle_notifier_v1", 1)
[3583248.831] [email protected](10, "zwp_idle_inhibit_manager_v1", 1)
[3583248.838] [email protected](11, "zwlr_layer_shell_v1", 4)
[3583248.845] [email protected](12, "xdg_wm_base", 2)
[3583248.852] [email protected](13, "zwp_tablet_manager_v2", 1)
[3583248.859] [email protected](14, "org_kde_kwin_server_decoration_manager", 1)
[3583248.867] [email protected](15, "zxdg_decoration_manager_v1", 1)
[3583248.882] [email protected](16, "zwp_relative_pointer_manager_v1", 1)
[3583248.890] [email protected](17, "zwp_pointer_constraints_v1", 1)
[3583248.896] [email protected](18, "wp_presentation", 1)
[3583248.903] [email protected](19, "zwlr_output_manager_v1", 4)
[3583248.911] [email protected](20, "zwlr_output_power_manager_v1", 1)
[3583248.917] [email protected](21, "zwp_input_method_manager_v2", 1)
[3583248.934] [email protected](22, "zwp_text_input_manager_v3", 1)
[3583248.943] [email protected](23, "zwlr_foreign_toplevel_manager_v1", 3)
[3583248.958] [email protected](24, "ext_session_lock_manager_v1", 1)
[3583248.973] [email protected](25, "wp_drm_lease_device_v1", 1)
[3583248.981] [email protected](26, "zwlr_export_dmabuf_manager_v1", 1)
[3583248.988] [email protected](27, "zwlr_screencopy_manager_v1", 3)
[3583249.029]  -> [email protected](27, "zwlr_screencopy_manager_v1", 1, new id [unknown]@6)
[3583249.045] [email protected](28, "zwlr_data_control_manager_v1", 2)
[3583249.101] [email protected](29, "wp_security_context_manager_v1", 1)
[3583249.124] [email protected](30, "wp_viewporter", 1)
[3583249.137] [email protected](31, "wp_single_pixel_buffer_manager_v1", 1)
[3583249.155] [email protected](32, "wp_content_type_manager_v1", 1)
[3583249.168] [email protected](33, "wp_fractional_scale_manager_v1", 1)
[3583249.181] [email protected](34, "zxdg_exporter_v1", 1)
[3583249.189] [email protected](35, "zxdg_importer_v1", 1)
[3583249.218] [email protected](36, "zxdg_exporter_v2", 1)
[3583249.240] [email protected](37, "zxdg_importer_v2", 1)
[3583249.253] [email protected](38, "xdg_activation_v1", 1)
[3583249.270] [email protected](39, "wp_cursor_shape_manager_v1", 1)
[3583249.283] [email protected](40, "zwp_virtual_keyboard_manager_v1", 1)
[3583249.295] [email protected](41, "zwlr_virtual_pointer_manager_v1", 2)
[3583249.308] [email protected](42, "zwlr_input_inhibit_manager_v1", 1)
[3583249.320] [email protected](43, "zwp_keyboard_shortcuts_inhibit_manager_v1", 1)
[3583249.327] [email protected](44, "zwp_pointer_gestures_v1", 3)
[3583249.359] [email protected](45, "wl_seat", 8)
[3583249.367] [email protected](46, "zwp_primary_selection_device_manager_v1", 1)
[3583249.383] [email protected](47, "wl_output", 4)
[3583249.399]  -> [email protected](47, "wl_output", 3, new id [unknown]@7)
[3583249.413] [email protected](48, "wl_output", 4)
[3583249.428]  -> [email protected](48, "wl_output", 3, new id [unknown]@8)
[3583249.441] [email protected](35)
[3583249.474]  -> [email protected]_xdg_output(new id zxdg_output_v1@3, wl_output@8)
[3583249.498]  -> [email protected]_xdg_output(new id zxdg_output_v1@9, wl_output@7)
[3583249.513]  -> [email protected](new id wl_callback@10)
[3583249.767] [email protected]_id(10)
[3583249.794] [email protected](0, 0, 430, 240, 0, "Acer Technologies", "V206HQL", 0)
[3583249.804] [email protected](1, 1600, 900, 60000)
[3583249.812] [email protected](1)
[3583249.818] [email protected]()
[3583249.876] [email protected](0, 0, 510, 290, 0, "Dell Inc.", "DELL U2312HM", 0)
[3583249.887] [email protected](1, 1920, 1080, 60000)
[3583249.895] [email protected](1)
[3583249.901] [email protected]()
[3583249.906] [email protected]("HDMI-A-2")
[3583249.913] [email protected]("Dell Inc. DELL U2312HM KF87Y39L100S (HDMI-A-2)")
[3583249.926] [email protected]_position(1600, 0)
[3583249.939] [email protected]_size(1920, 1080)
[3583249.967] [email protected]()
[3583249.975] [email protected]("HDMI-A-1")
[3583249.989] [email protected]("Acer Technologies V206HQL LY6AA01A85GL (HDMI-A-1)")
[3583249.996] [email protected]_position(0, 0)
[3583250.018] [email protected]_size(1600, 900)
[3583250.025] [email protected]()
[3583250.037] [email protected](35)
[3583250.050]  -> [email protected]_output(new id zwlr_screencopy_frame_v1@10, 0, wl_output@7)
[3583250.299] [email protected](875709016, 1600, 900, 6400)
[3583250.485]  -> [email protected]_pool(new id wl_shm_pool@11, fd 5, 5760000)
[3583250.505]  -> [email protected]_buffer(new id wl_buffer@12, 0, 1600, 900, 6400, 875709016)
[3583250.514]  -> [email protected]()
[3583250.523]  -> [email protected](wl_buffer@12)

This was from a 3rd machine with two HDMI outputs, still an Intel machine. Same software environment as previously described.

# lspci -s 00:02.0 
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 500 (rev 0b)
# lspci -n -s 00:02.0 
00:02.0 0300: 8086:5a85 (rev 0b)

@Ferdi265
Copy link
Owner

Thanks for the log! It looks like grim correctly calls copy() on the screencopy frame, but the compositor never finishes and doesn't send the ready() event. This looks like it's a bug in either Sway or wlroots. I'm gonna look at the code later and see if I can find something that looks like the issue.

@zboszor
Copy link
Author

zboszor commented Jul 23, 2024

Thanks in advance. I am using wlroots 0.17.4 and sway 1.9.

Originally grim was built from just 1 commit over the 1.4.0 tag, i.e. https://git.sr.ht/~emersion/grim/commit/89e02e663fabc534b7e7039514f60a8c5d70070d

Build from the latest commit https://git.sr.ht/~emersion/grim/commit/7dbb0f39cd79841bd0dc07ac4a7183facf34350e grim also stalls.

@zboszor
Copy link
Author

zboszor commented Aug 6, 2024

@Ferdi265 Any news?

@Ferdi265
Copy link
Owner

Ferdi265 commented Aug 6, 2024

Hi! sorry, I didn't get to looking at this in detail. I didn't find anything obvious in the wlroots codebase at that commit and I also wasn't able to reproduce the issue, but I only had time to look at it for an hour or so.

@Ferdi265
Copy link
Owner

Ferdi265 commented Aug 6, 2024

I recommend potentially opening an issue with wlroots, since the screencopy ready event is never sent.

@zboszor
Copy link
Author

zboszor commented Aug 6, 2024

Thank you. FWIW, I am using Yocto for a custom tailored distro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream-bug
Projects
None yet
Development

No branches or pull requests

2 participants