Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Redis Streams for immediate task delivery #492

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

RB387
Copy link
Contributor

@RB387 RB387 commented Dec 17, 2024

This pull request introduces a proof of concept for using Redis Streams to improve immediate task delivery in ARQ. The current ARQ implementation faces significant scaling challenges, especially in high-load, distributed environments.

Current Problem:

When handling large volumes of small, quick tasks (e.g., 300 tasks/sec) across multiple workers (10-30+), ARQ’s task distribution model struggles with:

  1. Locking Mechanism: On each poll iteration, ARQ attempts to acquire locks for all tasks in the queue, leading to a huge number of Redis calls.
  2. High Redis Load: Excessive polling generates unnecessary load on Redis, impacting both performance and scalability.
  3. Non-Linear Scaling: Compared to Celery, which can handle similar workloads using only ~10% CPU, ARQ’s CPU usage spikes to ~70% while still hitting scaling limits.

Proposed Solution:

This pull request explores how Redis Streams can be used for immediate task delivery to address these inefficiencies:

  • Race Condition Handling: Redis Streams naturally solve race conditions out of the box, eliminating the need for ARQ’s current locking mechanism.
  • Reduced Redis Calls: Task acquisition becomes significantly more efficient, reducing the load on Redis and CPU utilization.
  • Near-Instant Task Fetching: By avoiding polling, tasks are fetched and delivered almost immediately as they are published.

Backwards Compatibility (almost):

  • This implementation only applies to tasks that are ready for immediate execution.
  • Delayed tasks will continue to use the existing sorted sets approach.
  • The same Redis keys are reused, ensuring smooth upgrades and compatibility with existing ARQ deployments.
  • Warning: This PR breaks compatibility with Redis v5 since it uses the XAUTOCLAIM command, which significantly simplifies implementation by avoiding the addition of locks. This command was introduced in Redis 6.2.0. https://redis.io/docs/latest/commands/xautoclaim/ (but potentially it can be replaced with LUA script that does XPENDING + XCLAIM)

Key Benefits:

  • Increased Performance and Scalability: Benchmarks demonstrate significant improvements in task delivery performance and overall system scalability.
  • Lower CPU Utilization: By minimizing redundant Redis calls, CPU usage is greatly reduced under high-load scenarios on average.
  • Immediate Task Delivery: Tasks are fetched in real-time without relying on polling.
  • Aligned with ARQ’s Future Vision: This approach integrates seamlessly with ARQ’s roadmap for performance optimization. It implements what was mention in "Future plan for Arq" document Future plan for Arq #437 (2. Redis Streams 🚀, 3. Avoid sorted set for immediate jobs, 4. Avoid polling)

This PoC PR that demonstrates that this approach indeed possible and works. After sync with maintainers, this PR can be finished.

Benchmark results

They are not quite reliable as it was tested on a laptop with Redis in a Docker container. But here are some numbers:

Tasks with no delay

Current implementation

===============
Published tasks in 2.71 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.73 seconds
===============

Published tasks in 2.75 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 80.40 seconds
===============

Published tasks in 2.53 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 91.04 seconds
===============

Published tasks in 3.50 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 133.40 seconds
===============

Redis streams implementation

===============
Published tasks in 3.13 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 31.66 seconds
===============

Published tasks in 2.65 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 7.49 seconds
===============

Published tasks in 2.74 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 6.54 seconds
===============

Published tasks in 2.65 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 6.24 seconds
===============

Tasks with 1 second delay

Current implementation

===============
Published tasks in 2.51 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.25 seconds
===============

Published tasks in 2.64 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 79.59 seconds
===============

Published tasks in 2.56 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 85.00 seconds
===============

Published tasks in 2.48 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 127.51 seconds
===============

Redis streams implementation

===============
Published tasks in 2.50 seconds
===============
Starting 1 workers with 25 max jobs
Done 20000 tasks in 80.49 seconds
===============

Published tasks in 2.75 seconds
===============
Starting 10 workers with 25 max jobs
Done 20000 tasks in 26.39 seconds
===============

Published tasks in 2.52 seconds
===============
Starting 20 workers with 25 max jobs
Done 20000 tasks in 20.30 seconds
===============

Published tasks in 2.71 seconds
===============
Starting 40 workers with 25 max jobs
Done 20000 tasks in 15.98 seconds
===============

@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 91.13924% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
arq/worker.py 90.47% 6 Missing and 4 partials ⚠️
arq/connections.py 84.61% 2 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   96.27%   94.67%   -1.61%     
==========================================
  Files          11       12       +1     
  Lines        1074     1202     +128     
  Branches      209      218       +9     
==========================================
+ Hits         1034     1138     +104     
- Misses         19       36      +17     
- Partials       21       28       +7     
Files with missing lines Coverage Δ
arq/constants.py 100.00% <100.00%> (ø)
arq/jobs.py 97.19% <100.00%> (-0.97%) ⬇️
arq/lua.py 100.00% <100.00%> (ø)
arq/connections.py 86.51% <84.61%> (-3.55%) ⬇️
arq/worker.py 94.99% <90.47%> (-2.19%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f752e2...8e76b58. Read the comment docs.

rossmacarthur pushed a commit to hunchdata/arq that referenced this pull request Dec 21, 2024
@rossmacarthur
Copy link

rossmacarthur commented Dec 22, 2024

We tried this out in production 😬 😅 and we were able to reduce wait time of jobs massively. We recently scaled to many thousandsof jobs per minute and found that there was fairly significant time between when a job was queued and when it would be started, up to 15 seconds! Here you can see the changed after using this PR.

image

@RB387
Copy link
Contributor Author

RB387 commented Dec 23, 2024

@samuelcolvin @chrisguidry
hey hey folks 👋
It seems that performance improvements are highly requested by Arq users. What do you think about separating pull requests for performance improvements and Arq refactoring to finish and merge this branch sooner, then separately focus on Arq refactoring? This way, we could dedicate more time to Arq refactoring while also addressing critical performance issues important to some users

@Object905
Copy link

There seems to be something wrong with connection retrying/handling with this PR.

When I restart redis (docker restart dev-redis-1 for example) I'm getting redis.exceptions.ConnectionError: Connection closed by server.
I'm also getting same exception if I specify --watch for auto reload during development - arq restarts anyway, but logs this error.

@Graeme22
Copy link

@samuelcolvin @chrisguidry hey hey folks 👋 It seems that performance improvements are highly requested by Arq users. What do you think about separating pull requests for performance improvements and Arq refactoring to finish and merge this branch sooner, then separately focus on Arq refactoring? This way, we could dedicate more time to Arq refactoring while also addressing critical performance issues important to some users

Would be fantastic to get a performance-related release to be able to access these major improvements right away, even if the rewrite takes a while!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants