Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

Open
winhamwr opened this issue Sep 22, 2016 · 3 comments

Comments

@winhamwr
Copy link
Contributor

Sometimes, we might want multiple very-similar tasks to run (we can't just drop them or use the result from the first task), but not at the same time. Jobtastic can't currently help with this type of synchronization.

Support types of jobs where we want to prevent them from running at the same time, but we still want to run the other jobs, later.

Simultaneous Execution Prevention

  • Add a simultaneous_execution_prevention_timeout option that defaults to 0 (off)
  • When a job with that setting on hits a worker, it tries to acquires a cache lock
  • If it gets the lock, it goes on its merry way, making sure to release the lock when it's done or if it crashes
  • If it doesn't get the lock, it immediately retries the task with a short delay. Can we figure out how to separate a "I'm waiting on simultaneous lock" retry from a "the actual task needs a retry" so that a user's max_retry settings are actually used? We want to keep retrying indefinitely if something else has the simultaneous execution lock, since we can rely on that cache timeout to keep a global timeout for all potential executions of this type of task.
  • If herd_avoidance is >0 (active) or cache_duration is >=0 (active), we should raise an exception if someone tries to also set simultaneous_execution_prevention_timeout to >0 (active). They won't play nice together and it was almost certainly someone misunderstanding the docs.

Caveats to users based on countdown/eta/delay

This kind of thing can get you into a deadlock state with your queues. Because of the way worker prefetch_count, retry, and delay/eta/countdown interact, your retry call with a delay could block an entire pool of workers.

Let's say you have one worker pool with a concurrency of 3 and a prefetch_multiplier of 4. Then you queue up 13 jobs with simultaneous execution prevention turned on that all match via significant_kwargs. The first one to hit a worker will start running, and then the next 12 will get retried with a delay. Those will then immediately hang out in your worker pool, and since the pool only has 12 "slots" for tasks (3 concurrency times 4 prefetch_multiplier), and since the delay/eta/countdown happens at the worker pool level, the other 2 workers in your pool will have nothing to do. Even though you might be queuing up other jobs that could be run by those 2 workers, they can't get to them, because the pool has already pulled its max amount of jobs.

Could we mitigate this?

  • Maybe delay should be really fast, since retry does actually send things back to the broker? We'll potentially be churning through a lot of jobs that will just immediately retry after failure to acquire the lock, but that will at least let other jobs slip in between.
@thenewguy
Copy link

thenewguy commented Sep 23, 2016

Just curious, how will you implement the locking strategy without causing the project to require a specific backend? I partially copied a task mixin that implements locking for django tasks to achieve a similar purpose #57 (comment)

@winhamwr
Copy link
Contributor Author

@thenewguy thanks to #63, we now have a pluggable cache backend. The goal is for anyone using memcached or redis to have out of the box support for the locking strategy. Others might need to write a different cache backend, though.

@thenewguy
Copy link

Issue #83 is helpful here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants