This repository has been archived by the owner on Jun 18, 2024. It is now read-only.
[PATCH] SCX: scx_ops_bypass() can't grab a mutex #198
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes a bug introduced by 34c6075 ("scx: Fix scx_ops_bypass_depth going out of sync"). It added mutex_lock(scx_ops_enable_mutex) to scx_ops_bypass(), which can obviously lead to problem if scx_ops_bypass() is invoked from the disable path while someone is holding the mutex and scheduled out.
[ 246.906433] INFO: task sched_ext_ops_h:1531 blocked for more than 122 seconds.
[ 246.907862] Not tainted 6.9.0-rc5-work-02067-g7f2a15fc2eb2-dirty #34
[ 246.909213] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.910973] task:sched_ext_ops_h state:D stack:0 pid:1531 tgid:1531 ppid:2 flags:0x00004000
[ 246.913004] Sched_ext: qmap (enabled+all)
[ 246.913007] Call Trace:
[ 246.914486]
[ 246.914956] __schedule+0x750/0xcd0
[ 246.915770] schedule+0x41/0xb0
[ 246.916503] schedule_preempt_disabled+0x15/0x20
[ 246.917542] __mutex_lock+0x511/0xb00
[ 246.918379] ? scx_ops_bypass+0x54/0x1a0
[ 246.919232] mutex_lock_nested+0x1b/0x20
[ 246.920110] scx_ops_bypass+0x54/0x1a0
[ 246.920961] ? irqentry_exit+0x59/0x80
[ 246.921814] ? sysvec_irq_work+0x41/0x80
[ 246.922705] scx_ops_disable_workfn+0xd1/0x920
[ 246.923714] ? lock_acquire+0xcb/0x230
[ 246.924567] ? kthread_worker_fn+0x80/0x2a0
[ 246.925504] ? lock_release+0xd8/0x2d0
[ 246.926355] ? kthread_worker_fn+0xbe/0x2a0
[ 246.927262] kthread_worker_fn+0x10a/0x2a0
[ 246.928182] ? scx_dump_task+0x2f0/0x2f0
[ 246.929080] kthread+0xec/0x110
[ 246.929814] ? __kthread_init_worker+0x100/0x100
[ 246.930855] ? kthread_blkcg+0x40/0x40
[ 246.931718] ret_from_fork+0x36/0x40
[ 246.932545] ? kthread_blkcg+0x40/0x40
[ 246.933411] ret_from_fork_asm+0x11/0x20
[ 246.934311]
[ 246.934798]
[ 246.934798] Showing all locks held in the system:
[ 246.936169] 1 lock held by khungtaskd/60:
[ 246.937081] #0: ffffffff8335b128 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x30/0x190
[ 246.939051] 1 lock held by scx_qmap/1529:
[ 246.939963] 1 lock held by sched_ext_ops_h/1531:
[ 246.940999] #0: ffffffff83272e28 (scx_ops_enable_mutex){+.+.}-{3:3}, at: scx_ops_bypass+0x54/0x1a0
There's no reason to grab the mutex there. We can just test scx_enabled() without holding the mutex. Drop the broken locking.