-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Kernel: Start reducing the amount of RecursiveSpinlock
s
#25931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one thing that stuck out to me
(Also I should revive #20619, as that plays into this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more things on the second read-through,
otherwise looks fine by me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my first review. I still need to look at all the "Prefer non-recursive Spinlocks" commits if it's safe to use non-recursive spinlocks in all those cases.
Kernel/Heap/kmalloc.cpp
Outdated
static RecursiveSpinlock<LockRank::None> s_lock {}; // needs to be recursive because of dump_backtrace() | ||
static Spinlock<LockRank::None> s_lock {}; | ||
|
||
static void* kmalloc_impl(size_t size, size_t alignment, CallerWillInitializeMemory caller_will_initialize_memory, bool acquire_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this break if we allocate memory during a panic in kmalloc.cpp?
The call to critical_dmesgln
could theoretically allocate memory. But usually the inline capacity of the StringBuilder
should be enough.
Ideally we should get rid of all memory allocations during panics of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory we could set a global variable during panic to disable the lock, as we are going down anyway,
but that's likely a thing for another day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess force-disabling the lock during a panic would work indeed, but in the end we should really be avoiding allocation if we're already panicking.
@@ -44,7 +44,7 @@ UNMAP_AFTER_INIT void Device::initialize_base_devices() | |||
s_all_details->base_devices = move(base_devices); | |||
} | |||
|
|||
RefPtr<Device> Device::acquire_by_type_and_major_minor_numbers(DeviceNodeType type, MajorNumber major, MinorNumber minor) | |||
void Device::run_by_type_and_major_minor_numbers(DeviceNodeType type, MajorNumber major, MinorNumber minor, Function<void(RefPtr<Device>)> callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this name, but I can't think of a good one either.
Maybe something like for_device_by_type_and_major_minor_numbers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's not really a for
either though, because the callback should only ever get called once since the underlying map can contain just one instance of a given major/minor number pair.
46ee891
to
7477d68
Compare
Hello! One or more of the commit messages in this PR do not match the SerenityOS code submission policy, please check the |
8f97cb7
to
5be193a
Compare
That should fix the deadlock that's been haunting CI... (it was just an instance of using |
This eliminates the possibility of panicking due to encountering a page fault while trying access the cache inside an interrupt handler.
This makes it possible to (just barely) keep up when enabling kmalloc stack dumping (sysctl -w kmalloc_stacks=1) while running in a hardware accelerated aarch64 VM.
Previously, list nodes would keep the loop device itself alive until the node was manually removed from the list. This patch breaks that reference cycle, and allows normal reference counting to take care of the lifetimes of loop devices (although we do have to leak a reference on creation because nothing truly owns a loop device).
Previously, you'd occasionally see this sort of construct (simplified): ``` return FooDevice::all_instances().with([&](auto& list) { for (auto& device : list) { if (/* search condition */) continue; return do_something_that_reacquires(device.index()); } return ENOENT; }); ``` This would only work if the relevant `all_instances()` used a recursive spinlock, since reacquiring the device would lock that again. To get around this, a new helper has been added in `Device` through which devices can be accessed whilst locking the device list, removing the need to loop through the device type's `all_instances()`.
This assumption doesn't hold without the current Custody caching mechanism.
This is nigh-impossible to reconcile with non-recursive Spinlocks.
Just turned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round, I still need to look at the last 4 commits
@@ -15,12 +15,14 @@ constexpr static AK::Duration refresh_interval = AK::Duration::from_milliseconds | |||
NonnullLockRefPtr<Console> Console::initialize(VirtIODisplayConnector& parent_display_connector) | |||
{ | |||
auto current_resolution = parent_display_connector.current_mode_setting(); | |||
return adopt_lock_ref(*new Console(parent_display_connector, current_resolution)); | |||
auto refresh_timer = adopt_nonnull_ref_or_enomem(new (nothrow) Timer()).release_value_but_fixme_should_propagate_errors(); | |||
return adopt_lock_ref(*new Console(parent_display_connector, current_resolution, *refresh_timer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary *
PRINT_LINE(message.characters_without_null_termination(), message.length()); | ||
} else { | ||
constexpr auto buffer_size = 255; | ||
constexpr auto maximum_printable_name_length = buffer_size - (sizeof("Kernel + 0x +\n") + sizeof(FlatPtr) * 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sizeof()
includes the null terminator, which is probably not what you want.
And you also forgot to include the second 0x
in the calculation.
static RecursiveSpinlock<LockRank::None> s_lock {}; // needs to be recursive because of dump_backtrace() | ||
static Spinlock<LockRank::None> s_lock {}; | ||
|
||
static void* kmalloc_impl(size_t size, size_t alignment, CallerWillInitializeMemory caller_will_initialize_memory, CallerHasAcquiredLock acquire_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable should probably be called caller_has_acquired_lock
now.
@@ -44,7 +44,7 @@ UNMAP_AFTER_INIT void Device::initialize_base_devices() | |||
s_all_details->base_devices = move(base_devices); | |||
} | |||
|
|||
RefPtr<Device> Device::acquire_by_type_and_major_minor_numbers(DeviceNodeType type, MajorNumber major, MinorNumber minor) | |||
void Device::run_by_type_and_major_minor_numbers(DeviceNodeType type, MajorNumber major, MinorNumber minor, Function<void(RefPtr<Device>)> callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Function
isn't cheap to copy, so please use a const reference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A rvalue reference should also work, I think
@@ -38,11 +38,18 @@ VMObject::~VMObject() | |||
VERIFY(m_regions.is_empty()); | |||
} | |||
|
|||
void VMObject::remap_regions_locked() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't remap_regions_locked()
also be used in InodeVMObject::try_release_clean_pages()
and AnonymousVMObject::purge()
?
@@ -87,7 +87,7 @@ class SharedFramebufferVMObject final : public VMObject { | |||
LockRefPtr<FakeWritesFramebufferVMObject> m_fake_writes_framebuffer_vmobject; | |||
LockRefPtr<RealWritesFramebufferVMObject> m_real_writes_framebuffer_vmobject; | |||
bool m_writes_are_faked { false }; | |||
mutable RecursiveSpinlock<LockRank::None> m_writes_state_lock {}; | |||
mutable Spinlock<LockRank::None> m_writes_state_lock {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break when switching TTYs:
SharedFramebufferVMObject::switch_to_fake_sink_framebuffer_writes()
locks this lock
-> VMObject::remap_regions()
-> VMObject::remap_regions_locked()
-> Region::remap_with_locked_vmobject()
-> Region::remap_impl()
-> Region::map(PageDirectory& page_directory, ShouldLockVMObject should_lock_vmobject, ShouldFlushTLB should_flush_tlb)
-> Region::map_individual_page_impl(size_t page_index, ShouldLockVMObject should_lock_vmobject)
-> Region::physical_page_locked(size_t index) const
-> SharedFramebufferVMObject::physical_pages() const
also locks this lock
bool Region::should_dirty_on_write_impl(size_t page_index) const | ||
{ | ||
VERIFY(vmobject().m_lock.is_locked()); | ||
return !static_cast<InodeVMObject const&>(vmobject()).is_page_dirty(first_page_index() + page_index); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a lambda in should_dirty_on_write
instead
@@ -32,6 +32,11 @@ enum class ShouldFlushTLB { | |||
Yes, | |||
}; | |||
|
|||
enum class ShouldLockVMObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No user outside of Region
ever uses ShouldLockVMObject::No
, so I would instead keep the non-locking versions of these functions private, add _with_locked_vmobject
versions and keep this enum private.
@@ -179,7 +185,7 @@ class Region final | |||
|
|||
[[nodiscard]] size_t cow_pages() const; | |||
|
|||
[[nodiscard]] bool should_dirty_on_write(size_t page_index) const; | |||
[[nodiscard]] bool should_dirty_on_write(size_t page_index, ShouldLockVMObject should_lock_vmobject) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really related to your changes, but this function seems like it should be private
void remap_regions(); | ||
bool remap_regions_one_page(size_t page_index, NonnullRefPtr<PhysicalRAMPage> page); | ||
|
||
IntrusiveListNode<VMObject> m_list_node; | ||
FixedArray<RefPtr<PhysicalRAMPage>> m_physical_pages; | ||
|
||
mutable RecursiveSpinlock<LockRank::None> m_lock {}; | ||
mutable Spinlock<LockRank::None> m_lock {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock is used in so many places with quite complicated code, so I can't really check if this is a safe change.
I just hope this doesn't introduce some new deadlock in some weird edge case. In my testing your PR worked fine so far and I hope you tested this as well.
It would be nice if there was some automated way to verify with static analysis that no spinlock can be locked twice. That can't cover page faults, but I hope we don't cause page faults while this lock is already taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lock was made recursive in 082ed6f. The commit description doesn't have any reason why though.
The Linux kernel doesn't have any recursive spinlocks at all (well, at least not kernel-wide, IDK about random subsystems), so we certainly don't need to have these either. Primarily, these just lead to rather sloppy code (just see all the refactoring in this PR, which could have been fairly easily avoided if we hadn't used these to begin with) and are very SMP-unfriendly (a crash here is both commonplace and nigh-impossible to debug).
Historically, the proliferation of
RecursiveSpinlock
s has also been brought forward by the fact thatSpinlockProtected
(since its introduction in 019ad8a, part of #8851) used aRecursiveSpinlock
under the hood, which was only recently corrected (in a264a83), though this naturally left a ton of instances ofRecursiveSpinlockProtected
all over the place.This PR is based off of a branch where I removed
RecursiveSpinlock
entirely, but that didn't really pan out for SMP systems either, because when you put a properSpinlock
on the scheduler, only a single thread can run at a time. In the long term, it might prove favorable to rework the scheduling in an xv6-style way, where each CPU has its own instance of the scheduler (at least that would make the locking less of a nightmare).In conclusion, this PR shouldn't have much of an impact on the behavior of the system, but it should be a good step towards getting rid of
RecursiveSpinlock
s entirely. This probably fixes #22585 because nothing inKernel/FileSystem
usesRecursiveSpinlock
s anymore, but otherwise this shouldn't affect the stability of SMP systems in one way or another.