-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement Interruptible #417
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR! I need a little bit of time to review it and I'm on-call this week, so I hope to get to it next week 🙏
Thanks again, @hms! Would you mind elaborating a bit more on:
What where the connectivity issues you found? Was this only in tests, or did you observe these in a production environment? |
Hi Rosa, I don't run MySQL in production. The issues I've come across are strictly in a testing environment. What I get are random disconnections (and miles long backtraces) at the driver level that are frequently paired with broken pipe errors. Extra rescues on the pipes side didn't help. It's always more database errors than pipe errors, leading me to assume the root cause was the database driver. Since you folks are MySQL based and haven't seen any of these issues, I'm sure this is some weird combination of Mac, my local builds via brew, and/or how I'm configured to build. I did some digging, and the mysql2 gem had a window of time with similar issues some years ago right after an undocumented MySQL upgrade on the Oracle side, so maybe I'm a cannery in the coal mine? Relative to a justification for the change, I do think using a pure Ruby solution as long as it's 100% in compliance with the requirements is worth considering. My understanding for why the self-pipe is to avoid signal processing races that can be triggered by the anything using one of the signal aggregation APIs (database drivers) and signal traps. The Thread::Queue makes those guarantees with the same performance profile and a very simply to understand implementation. hms |
Yes, exactly! Thanks for the extra justification; that's super helpful. I do like this implementation much more, so I'm going to run this in production for a couple of days and see if we run into anything, and then I'll merge afterwards. Thank you! |
Oops, would you mind rebasing against the last |
No problem. But I won't be able to do for a couple of hours. Just about to step out. hms |
Oh, of course, no rush at all! 🙏 |
* Adds to RVM .ruby-gemset to .gitignore for RVM and gemset holdouts. * Updates Gemfile.lock to include arm64-darwin-24 (weird that it's missing) * Adds 4 development dependencies to quiet deprecation warnings * Updates rubocop to allow Ruby 3.3 syntax, matching .ruby-version
SolidQueue has excellent built-in error reporting. While this is fantastic for SQ users, it is less than ideal for testing SolidQueue because any test that deliberately uses or triggers an exception produces voluminous error reporting. This error reporting is hugely valuable when the exception is not expected, but distracting and of limited value for expected use-cases, especially when the test confirms the correct outcomes via assertions. This commit adds: * A generic test-specific Exception class: ExpectedTestError This allows testing for specific exceptions while retaining all error reporting infrastructure for unexpected exceptions. * Two helper methods for silencing on_thread_error output These methods accept an Exception or Array(Exception) and simply does not call the output mechanism if the exception passed to on_thread_error matches. This way, any unexpected error during test still reports in a highly visible manner while the exceptions being tested are validated via assertions. * Replaces the stock on_thread_error with one that ignores ExpectedTextError. Updated several tests from using the ruby stock RuntimeError to ExpectedTestError. * Configures tests to run with YJIT enabled This is to test under likely production deployment configuration, not for performance reasons. Note: With the very recent reporting on M4's crashing on Mac's with YJIT enabled, we might want to either defer this change or add a conditional to opt in until the problem is resolved.
* Replaces a little Unix cleverness with a standard Ruby class. This pushes the responsibiity for meeting the SQ requirements from SQ to stock Ruby * Delivers equivelent performance, identical API, and API behaviors with the original implementation (see note below on Futures) * Mostly fixes a *platform / version dependent* issue with MySQL (see below) * Meets 100% of SQ's functional requirements: * interruptible_sleep: a potentially blocking operation interruptible via either a "wake_event" (possibly requested prior to entering interruptible_sleep) or blocking until a timeout. * wake_up / interrupt: a Signal#trap and thread-safe method that does not require user-level synchronization (with the risk of not fully understanding all of the complexities required) code that either interrupts an inflight-interruptible_sleep or enqueues the event to processed in the invocation of interruptible_sleep * Interruptible's API is trivially reproduceable via Thread::Queue * interruptible_sleep => Queue.pop(timeout:) where pushing anything into the queue acts as the interrupt event and timeout is reliable without any extra code or exception handling. * wake_up / interrupt => Queue.push(Anything) is thread, fiber, and Signal.trap safe (can be called from anywhere) and captures all wake_up events whenever requested, automaticall caching any "event" not processed by a currently executing interruptible_sleep matching existing functionality exactly. Why the Future in #interruptible_sleep? While Thread::Queue micro benchmarks as having the same performance on the main thread Vs. any form of a sub-thread (or Fiber) and self-pipe, when running the SQ test suite we see a 35% slow down Vs. the original self-pipe implenentation. One assumes this slowdown would manifest in production. By moving the just the Queue#pop into a separate thread via Concurrent::Promises.future we get +/- identical performance to the original self-pipe implementation. I'm assuming this root causes to Ruby main-thread only housekeeping and/or possibly triggering a fast/slow path issue. Why a Future Vs. Thread#new for each interruptible_sleep call? Every other threaded operation in SQ is implemented using Concurrent Ruby. Using a Future is for code and architectual consistency. There is no difference in performance or functionality between the two. MySQL *only* issues: There seems to be a *platform specific* or *version specific* problem with MySQL database connectivity and/or broken self-pipes leading to randomly failing tests and a stream of distracting backtraces *even with successful* tests. Adding to the complexity sometimes, the lost database connection can self-heal -- HOWEVER -- this takes time and given how much of the test suite has time based assertions, leads to additional random test failures. These, or similar, issues have been observed in the past when changes to the MySQL client library forced changes in the mysql2 gem. With the Thread::Queue based implementation of the Interruptible concern, the random failures and amount of spurious output are dramatically improved (but not eliminated).
Rebased. |
This is working great so far, so I'm going to merge it. Thanks a lot @hms! |
Oops, so sorry, conflicts again! It needs rebasing again 🙈 Also, I realised I missed a comment for a small change, that I'll add now. |
* Streamlined a comment in the code * Removed unneeded gemspec developer_dependencies that we get handled via the upgrade to Rails 7.1.4.1 * Added a gemspec version dependency required for keyword argument support * Rebase to main
Rebased. Thank you for all of your feedback and help to get this finished. |
Thank you! |
Details for each of the commits are in the separate commit messages