Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs are not being fetched for certain queues #65

Open
motymichaely opened this issue Oct 20, 2021 · 15 comments
Open

Jobs are not being fetched for certain queues #65

motymichaely opened this issue Oct 20, 2021 · 15 comments

Comments

@motymichaely
Copy link
Contributor

We have an issue in which jobs are not being processed.

We are running a Faktory worker in Ruby 2.7.2 using a configuration file:

$ bundle exec faktory-worker -C config/faktory/worker.yml

The worker.yml file looks like this:

:queues:
  - jobs
:concurrency: 10
:timeout: 25

An example: JobWorker class:

class JobWorker
  include Faktory::Job

  def perform(arg1, arg2)
    puts "Arg1: #{arg1}"
    puts "Arg2: #{arg2}"
  end
end

Pushing a job to Faktory:

jid = SecureRandom.hex(12)
Faktory::Client.new.push(
  jid: jid,
  queue: "jobs",
  jobtype: "JobWorker",
  args: ["arg1", "arg2"]
)

The job is stuck in the jobs queue. I would expect the job to be processed by the worker.

I suspect it's related to the fact the JobWorker doesn't have a queue option but our requirement is to have JobWorker processed by several processes from different queues.

What am I missing?

@mperham
Copy link
Contributor

mperham commented Oct 20, 2021

I don't see anything obviously wrong. 🤷🏼‍♂️

@motymichaely
Copy link
Contributor Author

@mperham - How can I debug this? Is there any way to get a list of jobs per queue?

@mperham
Copy link
Contributor

mperham commented Oct 20, 2021

I'm not sure I understand? You can go to the Queues tab and click the name of the queue to view the currently enqueued jobs.

You haven't said which Faktory or FWR version you are using - have you checked the changelogs and/or upgraded to the latest to see if that fixes your issue?

@motymichaely
Copy link
Contributor Author

@mperham , FWR version is 1.1.0 (the latest) and the Faktory server version is Faktory Enterprise 1.6.0 - latest versions.

Faktory server web UI - I see the queue and the jobs.
But I suspect the issue is with the client, not fetching the jobs.

I was trying to call get_track for one of the jobs in the queue, and it seems like its state is in unknown state although I can see it in the queue (through the web UI):

Faktory::Client.new(debug: true).get_track("1ef1c6777b9be3585acaf187")
< +HI {"v":2,"i":4545,"s":"7aa6160a43951fff"}
> HELLO {"wid":"","hostname":"f61e7a94-387c-40a6-8fae-3c50edcd009b","pid":3,"labels":["ruby-2.7.2"],"username":"test","v":2,"pwdhash":"redacted"}
< +OK
> TRACK GET 1ef1c6777b9be3585acaf187
< $68
=> {"jid"=>"1ef1c6777b9be3585acaf187", "state"=>"unknown", "updated_at"=>""}

image

@motymichaely
Copy link
Contributor Author

@mperham - Any chance it's related to throttles? It's set for the queue to be 5. I see 0 Free and 0 Taken in the UI.

Is there any way to detect if there are some orphan throttled workers?

@mperham
Copy link
Contributor

mperham commented Oct 21, 2021

Absolutely it could be throttles. You didn't mention that the queue in question had a throttle. 0 Free and 0 Taken is a bad sign, I believe those two numbers should always add up to 5 in your case. It's possible the throttle tokens are leaking somehow. What version of Faktory are you running?

@motymichaely
Copy link
Contributor Author

motymichaely commented Oct 21, 2021 via email

@mperham
Copy link
Contributor

mperham commented Oct 22, 2021

The latest release is 1.5.5 so I'm confused.

To investigate, you'll need to open up the underlying Redis dataset and inspect the structures. If you have a throttle for queue "foo", the relevant keys are:

List free:foo (available tokens)
ZSet taken:foo (tokens in use)
Hash throttle:foo (metrics)

Those counts tell me that free and taken are both empty, meaning the tokens within them are gone somehow. If you restart Faktory, it should reset the throttle and restore the missing tokens.

Can you show me your throttle configuration?

@motymichaely
Copy link
Contributor Author

motymichaely commented Oct 24, 2021

@mperham - Thanks.

About the release, you can see the /debug output is Faktory Enterprise 1.6.0:
image

Not sure if I did it right, but I wasn't able to get any data for those keys for one of our throttled queues named mq_api:

redis /var/lib/faktory/db/redis.sock> LRANGE "free:mq_api" 0 -1
(empty array)
redis /var/lib/faktory/db/redis.sock> ZRANGE "taken:mq_api" 0 -1
(empty array)
redis /var/lib/faktory/db/redis.sock> HKEYS throttle:mq_api
(empty array)

Here's a snippet of our throttle configuration:

[throttles.mq_api]
concurrency = 5
timeout = 3600

I will now restart Faktory to see if that solves the issue.

@motymichaely
Copy link
Contributor Author

Restarting Faktory server solved the issue. Now, worths asking what would cause Throttle tokens to be missing.

@mperham
Copy link
Contributor

mperham commented Oct 24, 2021 via email

@mperham
Copy link
Contributor

mperham commented Oct 25, 2021

How busy is this queue and throttle? Are you hitting it all the time or will it sit idle for weeks? It's possible for the lock array to expire under you but that requires it to be idle for 30 days...

@motymichaely
Copy link
Contributor Author

The queue doesn't seem to be busy at all. Yes, it's likely to be idle for 30 days. Any way to control it?

@mperham
Copy link
Contributor

mperham commented Oct 26, 2021

Ok, that's good news and an easy fix. I will extend the TTL to 365 days in the next version.

@motymichaely
Copy link
Contributor Author

Great. Thank you for looking into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants