Concerns About RabbitMQ Storage Capacity for Large Volumes of Persistent Messages #13112

sergio-aguilar-tuhh · 2025-01-21T10:17:08Z

sergio-aguilar-tuhh
Jan 21, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.4

Erlang version used

26.2.x

Operating system (distribution) used

Linux based

How is RabbitMQ deployed?

Community Docker image

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

# PASTE OUTPUT HERE, BETWEEN BACKTICKS

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

# PASTE rabbitmq.conf HERE, BETWEEN BACKTICKS

Steps to deploy RabbitMQ cluster

.

Steps to reproduce the behavior in question

.

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

I am planning to use RabbitMQ to handle a large volume of data, exceeding many GB's, and ensuring data durability is critical to my use case.

I have some concerns about RabbitMQ's storage capacity and its ability to handle this volume effectively while maintaining performance.

Questions

What is the best way to configure RabbitMQ to handle a large number of persistent messages without losing data or severely impacting performance?
Are there any tests or studies conducted on RabbitMQ that measure the storage capacity of a broker and compare it to its performance?

Thank you in advance for your help!

Answered by mkuratczyk

Jan 21, 2025

GBs of data across many QQs should not be a problem in general. Keep in mind that QQs keep some per-message metadata in memory so the more messages you store, the more RAM will be used (again, it's metadata only, not messages themselves, as hopefully explained in the aforementioned blog post).

This however introduces another dimension - how many quorum queues? If they are relatively idle, a few thousands should be fine, but if you have tens of thousands of queues in mind - that would be a challenge.

You can use federation with QQs, but it sounds like shovel is more what you need. Or just develop your own app - there's nothing special about shovel - it's just an AMQP client that consumes m…

View full answer

mkuratczyk · 2025-01-21T10:31:32Z

mkuratczyk
Jan 21, 2025
Maintainer

Some key questions that immediately come to mind:

is the assumption here that those GBs of data will be stored in RabbitMQ for an extended period (queues with no consumers or streams) or do you mean you will publish a lot of data but you also expect to consume it quickly?
are these GBs of data due to a large message size or is it because you have that many, relatively small messages?
is all that data in a single queue/stream or is that spread across many queues/streams?

In general:

Quorum queues offer the highest data safety guarantees (see https://www.rabbitmq.com/blog/2025/01/17/how-are-the-messages-stored) but are not the best suited for large messages or very long backlogs of messages (it's impossible to tell where the thresholds are exactly - it depends on the message size, queue length, hardware, replication factor and so on)

Streams are better suited for large messages and long backlogs.

4 replies

sergio-aguilar-tuhh Jan 21, 2025
Author

First of all, @mkuratczyk thank you for your response!

To clarify the use case:

Message Size and Quantity:

The messages will primarily consist of sensor activity, so they will be relatively small in size.
The large volume of data (several GBs) is due to the high number of these small messages.

Storage Duration:

The data will be stored in RabbitMQ for an extended period, likely several hours, before being consumed by another broker or service.

Distribution:

The data will be spread across many queues, rather than being concentrated in a single queue.

Follow-Up Questions

Are quorum queues compatible with the Federation plugin for transferring messages to another broker or cluster? If not, what would be the best way to ensure data from quorum queues can be reliably forwarded to another broker or system?
Given that these GBs of messages will need to persist for hours until consumed by another service, would quorum queues still be a suitable option?

mkuratczyk Jan 21, 2025
Maintainer

GBs of data across many QQs should not be a problem in general. Keep in mind that QQs keep some per-message metadata in memory so the more messages you store, the more RAM will be used (again, it's metadata only, not messages themselves, as hopefully explained in the aforementioned blog post).

This however introduces another dimension - how many quorum queues? If they are relatively idle, a few thousands should be fine, but if you have tens of thousands of queues in mind - that would be a challenge.

You can use federation with QQs, but it sounds like shovel is more what you need. Or just develop your own app - there's nothing special about shovel - it's just an AMQP client that consumes messages and publishes them somewhere else.

It sounds like you can also consider a single replicate stream. The main difference in terms of data safety is that QQs use fsync and streams don't but if you replicate the stream and have proper production-grade high availability on the infrastructure level (including power), then you likely can assume you won't just suddenly lose multiple cluster members (which is pretty much the only way to perhaps lose messages with streams). If the goal is to store and forward the messages, a single stream will have much lower overhead (no per-message in-memory metadata, only 1 stream perhaps) and if you want that data split between queues, it could happen as part of forwarding (so you consume from a stream but publish to an exchange that will sort messages into different queues).

Answer selected by michaelklishin

sergio-aguilar-tuhh Jan 21, 2025
Author

@mkuratczyk Is there a way to explicitly clear or truncate a RabbitMQ stream after consuming the messages?

michaelklishin Jan 21, 2025
Maintainer

@sergio-aguilar-tuhh from the docs:

Streams model an append-only log of messages that can be repeatedly read until they expire

Streams allow for repeated reads exactly because consumption is non-destructive and expiration is used for data deletion.

You can always delete and re-declare the stream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns About RabbitMQ Storage Capacity for Large Volumes of Persistent Messages #13112

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Concerns About RabbitMQ Storage Capacity for Large Volumes of Persistent Messages #13112

sergio-aguilar-tuhh Jan 21, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Questions

Replies: 1 comment · 4 replies

mkuratczyk Jan 21, 2025 Maintainer

sergio-aguilar-tuhh Jan 21, 2025 Author

mkuratczyk Jan 21, 2025 Maintainer

sergio-aguilar-tuhh Jan 21, 2025 Author

michaelklishin Jan 21, 2025 Maintainer

sergio-aguilar-tuhh
Jan 21, 2025

Replies: 1 comment 4 replies

mkuratczyk
Jan 21, 2025
Maintainer

sergio-aguilar-tuhh Jan 21, 2025
Author

mkuratczyk Jan 21, 2025
Maintainer

sergio-aguilar-tuhh Jan 21, 2025
Author

michaelklishin Jan 21, 2025
Maintainer