Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

scx: Fix hotplug #217

Merged
merged 3 commits into from
Jun 6, 2024
Merged

scx: Fix hotplug #217

merged 3 commits into from
Jun 6, 2024

Conversation

Byte-Lab
Copy link
Collaborator

@Byte-Lab Byte-Lab commented Jun 5, 2024

The rq_online() and rq_offline() sched_class callbacks may be invoked on either
the domain creation path, or on the hotplug path. ext.c will only
invoke its hotplug logic when those callbacks are invoked on the hotplug path.
This turned out to be a bit brittle, as hotplug could break for sched_ext if we
first invoked rq_online() from the domain path, as it would cause us to skip
invoking it on the hotplug path due to hotplug implementation details. This in
fact happened after rebasing onto v6.10.

To avoid this frailty (and fix a hotplug regression for us after merging
v6.10), let's instead just call directly into ext.c on the hotplug path. This
could be made into a sched_class callback if that's what schedulers want, but
that's kind of what rq_online() and rq_offline() are supposed to be, so it's a
bit awkward as is. It would have been ideal if we could have just used a
hotplug event notifier, but as far as I can tell that doesn't exist.

@Byte-Lab Byte-Lab requested a review from htejun June 5, 2024 02:51
The existing hotplug code is a bit brittle in that the sched_class->{online,
offline} callbacks can be invoked both from hotplug context, or from domain
rebuild context. A callback is only invoked in one of the two contexts, with
the context that runs the callback setting rq->online accordingly to avoid
doing the callback more times than necessary. Unfortuntaly, this causes commit
2125c00 ("cgroup/cpuset: Make cpuset hotplug processing synchronous") to
break hotplug for us because it makes the topology rq->online event happen
before the cpu hotplug rq->online event; thus preventing us from invoking the
cpu online callback in the scheduler.

This integration is fragile and hacky, so we'll instead call directly into
ext.c on the hotplug path. This will be added in a subsequent commit.

Signed-off-by: David Vernet <[email protected]>
Copy link
Collaborator

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one inline comment and I think the hotplug ops invocations would need to be updated to use KF_SLEEPABLE.

kernel/sched/core.c Outdated Show resolved Hide resolved
The rq_online() and rq_offline() sched_class callbacks may be invoked on either
the domain creation path, or on the hotplug path. ext.c would previously only
invoke its hotplug logic when those callbacks are invoked on the hotplug path.
This turned out to be a bit brittle, as hotplug could break for sched_ext if we
first invoked rq_online() from the domain path, as it would cause us to skip
invoking it on the hotplug path due to hotplug implementation details.

To avoid this frailty (and fix a hotplug regression for us after merging
v6.10), let's instead just call directly into ext.c on the hotplug path. In
doing so, we can also move the ops.cpu_online() and ops.cpu_offline() callbacks
outside of the rq lock region, thus making them sleepable. A test for this will
be added in a subsequent patch.

Signed-off-by: David Vernet <[email protected]>
Now that ops.cpu_online() and ops.cpu_offline() can be sleepable,
let's update their test BPF progs to be sleepable to validate.

Signed-off-by: David Vernet <[email protected]>
@Byte-Lab
Copy link
Collaborator Author

Byte-Lab commented Jun 6, 2024

Left one inline comment and I think the hotplug ops invocations would need to be updated to use KF_SLEEPABLE.

Good, sorry for the oversight. I've added a call to a blockable kfunc in the testcase which would've noticed that.

@Byte-Lab Byte-Lab merged commit 9ee6343 into sched_ext Jun 6, 2024
1 check failed
@Byte-Lab Byte-Lab deleted the fix_hotplug branch June 6, 2024 18:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants