Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

winhamwr · 2016-09-22T16:59:37Z

Sometimes, we might want multiple very-similar tasks to run (we can't just drop them or use the result from the first task), but not at the same time. Jobtastic can't currently help with this type of synchronization.

Support types of jobs where we want to prevent them from running at the same time, but we still want to run the other jobs, later.

Simultaneous Execution Prevention

Add a simultaneous_execution_prevention_timeout option that defaults to 0 (off)
When a job with that setting on hits a worker, it tries to acquires a cache lock
If it gets the lock, it goes on its merry way, making sure to release the lock when it's done or if it crashes
If it doesn't get the lock, it immediately retries the task with a short delay. Can we figure out how to separate a "I'm waiting on simultaneous lock" retry from a "the actual task needs a retry" so that a user's max_retry settings are actually used? We want to keep retrying indefinitely if something else has the simultaneous execution lock, since we can rely on that cache timeout to keep a global timeout for all potential executions of this type of task.
If herd_avoidance is >0 (active) or cache_duration is >=0 (active), we should raise an exception if someone tries to also set simultaneous_execution_prevention_timeout to >0 (active). They won't play nice together and it was almost certainly someone misunderstanding the docs.

Caveats to users based on countdown/eta/delay

This kind of thing can get you into a deadlock state with your queues. Because of the way worker prefetch_count, retry, and delay/eta/countdown interact, your retry call with a delay could block an entire pool of workers.

Let's say you have one worker pool with a concurrency of 3 and a prefetch_multiplier of 4. Then you queue up 13 jobs with simultaneous execution prevention turned on that all match via significant_kwargs. The first one to hit a worker will start running, and then the next 12 will get retried with a delay. Those will then immediately hang out in your worker pool, and since the pool only has 12 "slots" for tasks (3 concurrency times 4 prefetch_multiplier), and since the delay/eta/countdown happens at the worker pool level, the other 2 workers in your pool will have nothing to do. Even though you might be queuing up other jobs that could be run by those 2 workers, they can't get to them, because the pool has already pulled its max amount of jobs.

Could we mitigate this?

Maybe delay should be really fast, since retry does actually send things back to the broker? We'll potentially be churning through a lot of jobs that will just immediately retry after failure to acquire the lock, but that will at least let other jobs slip in between.

The text was updated successfully, but these errors were encountered:

thenewguy · 2016-09-23T21:43:33Z

Just curious, how will you implement the locking strategy without causing the project to require a specific backend? I partially copied a task mixin that implements locking for django tasks to achieve a similar purpose #57 (comment)

winhamwr · 2016-09-26T21:51:40Z

@thenewguy thanks to #63, we now have a pluggable cache backend. The goal is for anyone using memcached or redis to have out of the box support for the locking strategy. Others might need to write a different cache backend, though.

thenewguy · 2019-11-10T04:34:15Z

Issue #83 is helpful here

winhamwr added the enhancement label Sep 22, 2016

winhamwr mentioned this issue Sep 22, 2016

Python core dump or memcached/redis I/O error can cause other-process apply_async to loop for thundering_herd_timeout #66

Open

kylegibson mentioned this issue Sep 22, 2016

Add option to retry a task later if the task is already running #67

Closed

thenewguy mentioned this issue Sep 23, 2016

Add option to skip herd avoidance check, but still trigger avoidance for other tasks #57

Open

thenewguy mentioned this issue Nov 10, 2019

Expose method to extend lock timeout #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

winhamwr commented Sep 22, 2016

thenewguy commented Sep 23, 2016 •

edited

Loading

winhamwr commented Sep 26, 2016

thenewguy commented Nov 10, 2019

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

Add ability to ensure only 1 version of a task is running at a time, but still actually run subsequent jobs: Simultaneous Execution Prevention #68

Comments

winhamwr commented Sep 22, 2016

Simultaneous Execution Prevention

Caveats to users based on countdown/eta/delay

Could we mitigate this?

thenewguy commented Sep 23, 2016 • edited Loading

winhamwr commented Sep 26, 2016

thenewguy commented Nov 10, 2019

thenewguy commented Sep 23, 2016 •

edited

Loading