Solid Queue Integration #199

carlosantoniodasilva · 2024-04-22T20:16:57Z

This adds an adapter to integrate with SolidQueue, reporting job queue time and busy metrics (if enabled) to Judoscale for autoscaling.

SolidQueue is currently on v0.3, still pretty early on, and there's still some things being figured out, but there's early adoption and we expect more as it becomes a Rails recommendation / default in the future. It only works with Rails v7.1+ and Ruby 2.7+, so that's what this adapter will support initially.

We'll be collecting queue time / latency via the "ready executions" table, and busy via the "claimed executions" table.

SolidQueue moves jobs between different tables as they change "status", in other words, while all jobs have a representation on the main "jobs" table, they also get a record on an associated table that may represent what's happening to them: when they're ready to be picked up for work, they go to "ready executions", when they're claimed by a process worker to be performed, they go to "claimed executions", and if there's a failure (that's not retired by Active Job), they go to "failed executions"; if they're scheduled to run in the future, they go to "scheduled executions" (or if they're being retried by AJ, which is essentially re-scheduling them in the future, until it succeeds or gives up retrying and blows up back to SolidQueue.)

When jobs are finished successfully, they are flagged with a "finished_at" column on the main "jobs" table. As the jobs moves from one to the other "execution" status in the workflow, their previous record is destroyed, so there should be really only one of those "execution" representations at one point in time. (i.e. a job is either scheduled, ready, claimed, failed)

There's also the concept of recurring executions, which are created via config (a cron-like setup), and eventually get added to "ready executions" for every recur.

And finally, there's one thing I have to look a bit more: blocked executions. It seems you can add concurrency limit to jobs, which may lock certain jobs from running (if they're concurrency limited by a certain condition) and will move them to a separate "blocked executions" table. I would like to test this more, because I'm wondering if we need to check this table for jobs in order to calculate queue time as well.

Todo / Questions

Investigate "blocked executions" / concurrency limits, to determine whether they should be added to the queue time / latency.
- I've been playing with this some, and it works as you'd expect: you can setup a job with a concurrency limit, i.e. run only one job at a time, or one job with this set of arguments, or up to X jobs concurrently, etc., and if more jobs are enqueued, instead of going to "ready", they go to "blocked". When jobs are finished, they check for blocked jobs to unblock them, and there's also an additional dispatcher that checks for blocked jobs on a schedule.
- While initially I thought it'd make sense to consider these for the latency calculation, the more I thought about and played with it, the more it came to mind that having a big list of blocked jobs doesn't mean a need to autoscale: you might simply be limiting the concurrency of those jobs to a point where many are getting enqueued at certain points, but just a few get processed due to the limits imposed. This could cause the blocked execution table to grow temporarily, causing those blocked jobs to have "increased latency", but autoscaling up might be wrong in this case, since more processing power won't make those jobs complete any faster -- they're still limited by their concurrency setup. In other words, I'm thinking that autoscaling should only look for jobs in the "ready execution" table initially.

Sample query I was playing with, for reference

          ::SolidQueue::Job
            .left_joins(:blocked_execution, :ready_execution)
            .merge(::SolidQueue::BlockedExecution.where.not({ id: nil }))
            .or(::SolidQueue::ReadyExecution.where.not({ id: nil }))
            .group(:queue_name)
            .minimum("coalesce(#{::SolidQueue::BlockedExecution.table_name}.created_at, #{::SolidQueue::ReadyExecution.table_name}.created_at)")

It was pointing to a non-existent `good_job_active_record`, where the library is `good_job`, and is available currently on v3+.

This is a copy of other sample apps, but needs to upgrade it to Rails v7.1 in order to actually install solid queue.

The sample setup helps ensure the reporting works. We can install mission control manually if we want to inspect jobs, but I've left it commented out because right now it relies on Rails and sprockets and we don't really need those dependencies most of the time.

[ci skip]

carlosantoniodasilva

@adamlogic the SolidQueue integration seems to be working well so far, sending you for an initial look.

carlosantoniodasilva · 2024-04-23T20:45:36Z

.github/workflows/judoscale-solid_queue.yml

+        gemfile:
+          - Gemfile
+        ruby:
+          - "2.7"


SolidQueue only works with Rails 7.1+, and so only Ruby 2.7+.

carlosantoniodasilva · 2024-04-23T20:46:18Z

judoscale-good_job/test/test_helper.rb

@@ -10,6 +10,7 @@
 require "action_controller"

 class TestRailsApp < Rails::Application
+  config.load_defaults "#{Rails::VERSION::MAJOR}.#{Rails::VERSION::MINOR}"


Setting defaults from the current Rails version, eliminates a warning with old cache version format.

carlosantoniodasilva · 2024-04-23T20:47:10Z

judoscale-solid_queue/judoscale-solid_queue.gemspec

+  spec.required_ruby_version = ">= 2.7.0"
+
+  spec.add_dependency "judoscale-ruby", Judoscale::SolidQueue::VERSION
+  spec.add_dependency "solid_queue", ">= 0.3"


0.3 is the latest version at the moment, I think it's best to use it as requirement.

carlosantoniodasilva · 2024-04-23T20:48:25Z

judoscale-solid_queue/lib/judoscale/solid_queue/metrics_collector.rb

+        super
+
+        queue_names = run_silently do
+          ::SolidQueue::Job.distinct.pluck(:queue_name)


Querying from the job model will include queues from already finished or error jobs -- basically all known SQ jobs that haven't been deleted yet. (they have documented a way to run a cleanup task that deletes finished jobs after a day, but aren't doing it automatically yet.)

carlosantoniodasilva · 2024-04-23T20:49:39Z

judoscale-solid_queue/lib/judoscale/solid_queue/metrics_collector.rb

+        time = Time.now.utc
+
+        oldest_execution_time_by_queue = run_silently do
+          ::SolidQueue::ReadyExecution.group(:queue_name).minimum(:created_at)


Jobs move to the "ready execution" table when they're ready to be picked up by a worker and processed. (so if you enqueue a job right now, it creates both a "job" and a "ready execution" records, but if you schedule one in the future, it creates a "job" and a "scheduled execution" instead -- that laters get moved to "ready execution" when it's time to run it.)

carlosantoniodasilva · 2024-04-23T20:51:01Z

judoscale-solid_queue/lib/judoscale/solid_queue/metrics_collector.rb

+
+        if track_busy_jobs?
+          busy_count_by_queue = run_silently do
+            ::SolidQueue::Job.joins(:claimed_execution).group(:queue_name).count


Jobs move from "ready execution" to "claimed execution" when they're picked up by a worker. That deletes the "ready" record. Once finished, the "job" is tagged with a "finished_at" value, and "claimed" record is deleted. If failed, a "failed execution" record is also created.

carlosantoniodasilva · 2024-04-23T20:57:11Z

judoscale-solid_queue/test/test_helper.rb

+# It seems we can't only set it on `DatabaseTasks` as expected, need to set on the `Migrator` directly instead.
+ActiveRecord::Migrator.migrations_paths += SolidQueue::Engine.config.paths["db/migrate"].existent
+# ActiveRecord::Tasks::DatabaseTasks.migrations_paths += SolidQueue::Engine.config.paths["db/migrate"].existent
+ActiveRecord::Tasks::DatabaseTasks.migrate


I was playing with a way to get the migrations to run automatically to the latest, came up with this... not great, but seems better than copying & pasting the whole migration. (we can replicate to the others later, if this doesn't cause any trouble.)

carlosantoniodasilva · 2024-04-23T20:58:13Z

sample-apps/solid_queue-sample/Gemfile

+# (A `/jobs` route is added via config/routes.rb if `MissionControl` is detected.)
+# Note: mission control requires assets, so we also need sprockets-rails here for now.
+# gem "mission_control-jobs"
+# gem "sprockets-rails"


I opted for leaving Mission Control commented out, it's nice but not necessary to test the sample app, and adds the whole rails gem and sprockets as a dependency.

carlosantoniodasilva · 2024-04-23T20:58:48Z

sample-apps/solid_queue-sample/Gemfile

+
+# Require only the frameworks we currently use instead of loading everything.
+%w(activerecord actionpack actionview railties activejob activemodel).each { |rails_gem|
+  gem rails_gem, "~> 7.1.0"


We can upgrade the other sample apps to 7.1 later. (they're on 7.0)

adamlogic

This is working great! I noted one small annoyance in the sample app setup, but I'm not sure how to get around it. Let's not get too hung up on it if you don't have any immediate ideas.

I created a couple small PR's from this one to consider:

adamlogic · 2024-04-25T15:02:24Z

sample-apps/solid_queue-sample/db/schema.rb

+  create_schema "_timescaledb_cache"
+  create_schema "_timescaledb_catalog"
+  create_schema "_timescaledb_config"
+  create_schema "_timescaledb_internal"
+  create_schema "timescaledb_experimental"
+  create_schema "timescaledb_information"
+  create_schema "toolkit_experimental"


These were causing me trouble locally when running db:prepare (via bin/setup):

solid_queue-sample $ bin/rails db:prepare (1.5ms) CREATE SCHEMA "_timescaledb_cache" bin/rails aborted! ActiveRecord::StatementInvalid: PG::DuplicateSchema: ERROR: schema "_timescaledb_cache" already exists (ActiveRecord::StatementInvalid) /Users/adam/Projects/judoscale-ruby/sample-apps/solid_queue-sample/db/schema.rb:14:in `block in <top (required)>' /Users/adam/Projects/judoscale-ruby/sample-apps/solid_queue-sample/db/schema.rb:13:in `<top (required)>' Caused by: PG::DuplicateSchema: ERROR: schema "_timescaledb_cache" already exists (PG::DuplicateSchema) /Users/adam/Projects/judoscale-ruby/sample-apps/solid_queue-sample/db/schema.rb:14:in `block in <top (required)>' /Users/adam/Projects/judoscale-ruby/sample-apps/solid_queue-sample/db/schema.rb:13:in `<top (required)>' Tasks: TOP => db:prepare (See full trace by running task with --trace)

I deleted these lines and ran db:prepare successfully, but the lines were automatically added back to schema.rb.

This is only a problem on first-time setup, but it's annoying. It's also Timescale-specific, which our sample apps don't require at all.

I'm not really sure any way around it, though. If you try to set up a sample app while connected to Postgres with Timescale enabled, you'll get these lines in your schema. 🤔

I think I saw some timescale stuff dumped to the schema, however some output was also in the good_job sample app and I didn't pay much attention, but now I can see that create_schema is only on this one... turns out it's something that was added to Rails 7.1:

dump PostgreSQL schemas as part of the schema dump rails/rails#49164

So when recreating the DB, it will try to recreate these schemas and fail... it works with enable_extension because I believe that adds a "IF NOT EXISTS", but create_schema apparently does not do anything like that... maybe it should.

There's some potentially related changes:

Allow SchemaDumper to ignore schemas rails/rails#49555

Postgresql improvements to create_schema and drop_schema rails/rails#48132

Other than monkey-patching Rails / schema dumper, I can't really think of an option right now 🤔

Thanks for the digging! Let's just leave it for now. I think we're the only ones using these sample apps.

adamlogic · 2024-04-25T15:21:15Z

While initially I thought it'd make sense to consider these for the latency calculation, the more I thought about and played with it, the more it came to mind that having a big list of blocked jobs doesn't mean a need to

I agree with your analysis of the scheduled executions table.

I guess one thing we'll need to consider is that if someone scales their workers down to zero, their scheduled/blocked executions will never run. That's more of an app UX consideration... just thinking aloud here.

This enables the "/jobs" endpoint for viewing more details about jobs.

This maps the queue priority to the queues we use for jobs in the sample app.

carlosantoniodasilva · 2024-04-25T18:07:26Z

I guess one thing we'll need to consider is that if someone scales their workers down to zero, their scheduled/blocked executions will never run. That's more of an app UX consideration... just thinking aloud here.

Makes sense, but I think that's a general consideration to have with job/workers that's not specific to SolidQueue, one could argue the same is true for Sidekiq for example, and it's unique enterprise feature.

carlosantoniodasilva self-assigned this Apr 22, 2024

carlosantoniodasilva force-pushed the ca-solid-queue branch from 5a3042e to 877d1c6 Compare April 22, 2024 20:18

Fix GoodJob dependency on Rails Autoscale version of the gem

8f48b11

It was pointing to a non-existent `good_job_active_record`, where the library is `good_job`, and is available currently on v3+.

carlosantoniodasilva force-pushed the ca-solid-queue branch 2 times, most recently from e276263 to 7916eab Compare April 22, 2024 20:23

carlosantoniodasilva added 3 commits April 23, 2024 14:30

[WIP] SolidQueue integration

6ca3cd4

Add sample app for solid queue

6e64349

This is a copy of other sample apps, but needs to upgrade it to Rails v7.1 in order to actually install solid queue.

Update Solid Queue sample app to Rails 7.1

620cbaf

carlosantoniodasilva force-pushed the ca-solid-queue branch from 7916eab to 425c672 Compare April 23, 2024 20:34

carlosantoniodasilva added 2 commits April 23, 2024 18:06

No need for redis, use correct heroku process name on readmes

80914e2

carlosantoniodasilva force-pushed the ca-solid-queue branch from 425c672 to 80914e2 Compare April 23, 2024 21:08

Update readme to include the new SolidQueue job adapter

186f1f1

[ci skip]

carlosantoniodasilva changed the title ~~[WIP] Solid Queue Integration~~ Solid Queue Integration Apr 23, 2024

carlosantoniodasilva marked this pull request as ready for review April 23, 2024 21:23

carlosantoniodasilva commented Apr 23, 2024

View reviewed changes

carlosantoniodasilva requested a review from adamlogic April 23, 2024 21:23

adamlogic approved these changes Apr 25, 2024

View reviewed changes

adamlogic added 2 commits April 25, 2024 15:02

Add Mission Control Jobs to Solid Queue sample app (#200)

e8409bb

This enables the "/jobs" endpoint for viewing more details about jobs.

Enforce queue priority for Solid Queue sample app (#201)

5541069

This maps the queue priority to the queues we use for jobs in the sample app.

carlosantoniodasilva merged commit 30d95b9 into main Apr 25, 2024
120 checks passed

carlosantoniodasilva deleted the ca-solid-queue branch April 25, 2024 18:46

github-actions bot mentioned this pull request Apr 25, 2024

chore(main): release 1.6.0 #202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solid Queue Integration #199

Solid Queue Integration #199

carlosantoniodasilva commented Apr 22, 2024 •

edited

Loading

carlosantoniodasilva left a comment

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

carlosantoniodasilva Apr 23, 2024

adamlogic left a comment

adamlogic Apr 25, 2024

carlosantoniodasilva Apr 25, 2024

adamlogic Apr 25, 2024

adamlogic commented Apr 25, 2024

carlosantoniodasilva commented Apr 25, 2024

Solid Queue Integration #199

Solid Queue Integration #199

Conversation

carlosantoniodasilva commented Apr 22, 2024 • edited Loading

Todo / Questions

carlosantoniodasilva left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamlogic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamlogic commented Apr 25, 2024

carlosantoniodasilva commented Apr 25, 2024

carlosantoniodasilva commented Apr 22, 2024 •

edited

Loading