Mnesia: binding tables can become inconsistent when a node hosting transient queues is restarted #13030

ccp-sandor · 2025-01-07T09:58:03Z

ccp-sandor
Jan 7, 2025

Describe the bug

TLDR: We had a mismatch of data between rabbit_db_topic_exchange trie records and rabbit_db_binding records after a rolling rabbitmq server update (we saw some flow controlled channels so we decided to use a slightly higher instance size for our nodes).
This resulted in rabbitmqctl listing correct bindings, web UI displaying all correct bindings, but messages got routed in an unexpected way.

Rabbitmq server version used: RabbitMQ 3.13.7 Erlang 26.2.5.2

We have the following exchange setup for handling incoming requests:

1: Exchange eve-requests-v2
a: This has bindings to queues with the request 'type'. So each service we have will start it's own queue, and will bind it to eve-requests-v2 with all the requests 'types' it can handle.
b: this exchange also has a '#' wildcard binding to the eve-v3-requests exchange.

2: Exchange eve-v3-requests
a: has an alternate exchange, directing unmatched traffic to a third exchange 'unroutable-requests'
b: This has bindings to queues with the request 'type'. So each service we have will start it's own queue, and will bind it to eve-requests-v2 with all the requests 'types' it can handle.

3: Exchange unroutable-requests
a: a queue will have a '#' binding on this exchange, the purpose is that we wanted to have an alert if we have incoming messages that services do not expect.

So each one of our service creates it's own queue, and will bind it to both eve-requests-v2 and eve-v3-requests.
We're relying on rabbitmq to collapse and de-duplicate the queue destination when routing messages.

We started to observe some flow control as our traffic was growing (namely channels and connections being flow controlled, but now queues themselves) so our guess was that some queue processes can't keep up with traffic, so we wanted to change instance types in AWS to give more cpu resources (specifically more single core performance) to nodes.
We then did a rolling update on our rabbitmq cluster (of 3 nodes): we brought a new node up, joined it to the cluster, let things stabilize for a few minutes, then take an old smaller node out. Then rinse and repeat until we only have new nodes in the cluster.

When verifying the update, we checked with rabbitmqctl if all the queues and bindings that we expect are in place, and everything seemed good.
Then soon after this, we saw our system alerting us to some unroutable messages that should have been routed correctly. So we started digging.

Our expected case, with a service that was working after the rolling update:

rabbit_db_topic_exchange:match({resource,<<"eve-live">>,exchange,<<"eve-requests-v2">>}, <<"eve_assetholding_asset_api_spawnrequest.tranquility">>, Opts).
[{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
 {resource,<<"eve-live">>,queue,
           <<"esg-eve-requests-eve-asset-holding-esgc_erh">>}]`

`rabbit_db_topic_exchange:match({resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}, <<"eve_assetholding_asset_api_spawnrequest.tranquility">>, Opts).
[{resource,<<"eve-live">>,queue,
           <<"esg-eve-requests-eve-asset-holding-esgc_erh">>}]

So the same queue has the same binding on both eve-requests-v2 and eve-v3-requests exchanges.

A scenario that caused unroutable message alert in our system:

rabbit_db_topic_exchange:match({resource,<<"eve-live">>,exchange,<<"eve-requests-v2">>}, <<"eve_cosmetic_market_skin_listing_api_getallrequest.tranquility">>, Opts).
[{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
 {resource,<<"eve-live">>,queue,
           <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>}]`

`rabbit_db_topic_exchange:match({resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}, <<"eve_cosmetic_market_skin_listing_api_getallrequest.tranquility">>, Opts).
[]

Since we had no match on eve-v3-requests the message got handed off to the alternate exchange, which causes our unroutable message alerts to be emitted.

But at the same time what we saw via the web UI or rabbitmqctl is this:

root@:~# rabbitmqctl list_bindings --vhost eve-live | grep eve-v3-requests | grep esg-eve-requests-eve-cosmetic-market

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_component_listing_api_getallrequest.#       []

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_skin_listing_api_admin_getalllistedbycharacterrequest.#     []

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_skin_listing_api_admin_setexpirationrequest.#       []

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_skin_listing_api_getallrequest.#    []

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_transaction_api_getallrequest.#     []

eve-v3-requests exchange        esg-eve-requests-eve-cosmetic-market-esgc_erh   queue   eve_cosmetic_market_transaction_api_getrequest.#        []

root@:~# rabbitmqctl list_bindings --vhost eve-live | grep eve-v3-requests | grep esg-eve-requests-eve-cosmetic-market | wc -l
6

If I queried the bindings information from a shell attached to the rabbitmq cluster from rabbit_db_bindings, I got the same information:

io:format("~p", [
                           lists:filter(fun({binding, {resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}, _, {resource,<<"eve-live">>,queue,<<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>}, _}) -> true; (_) -> false end,
                           rabbit_db_binding:get_all_for_source({resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}))
                           ]).
                           
[{binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_component_listing_api_getallrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []},
          
 {binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_skin_listing_api_admin_getalllistedbycharacterrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []},
          
 {binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_skin_listing_api_admin_setexpirationrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []},
          
 {binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_skin_listing_api_getallrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []},
          
 {binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_transaction_api_getallrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []},
          
 {binding,{resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>},
          <<"eve_cosmetic_market_transaction_api_getrequest.#">>,
          {resource,<<"eve-live">>,queue,
                    <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
          []}]
ok

If I queried the trie records though:

lists:map(fun(BK) -> rabbit_db_topic_exchange:trie_records_to_key(mnesia:dirty_read(rabbit_topic_trie_binding, BK)) end, lists:filter(fun({trie_binding, {resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}, _, {resource,<<"eve-live">>,queue,<<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>}, _}) -> true; (_) -> false end, mnesia:dirty_all_keys(rabbit_topic_trie_binding))).

[[{#trie_binding{exchange_name = #resource{virtual_host = <<"eve-live">>,
                                           kind = exchange,name = <<"eve-v3-requests">>},
                 node_id = <<202,152,109,239,157,189,40,151,231,221,58,199,
                             62,85,232,60>>,
                 destination = #resource{virtual_host = <<"eve-live">>,
                                         kind = queue,
                                         name = <<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>},
                 arguments = []},
   ["eve_cosmetic_market_skin_listing_api_admin_getalllistedbycharacterrequest",
    "#"]}]]

I only found trie records for one binding.

So it seems binding information stored in these two places got desynced during the rolling update process.

(Trie information for the working service completely matches up with the other binding information, I can provide them if needed.)

Reproduction steps

We cannot consistently reproduce this issue. But when it happened:

Do a rolling update on a cluster of 3 nodes, bringing a new node up, waiting a few minutes, bringing an old node down.
Compare rabbit_db_bindings and rabbit_db_topic_exchange records, they sometimes diverge

Expected behavior

Expected bindings not just to be displayed correctly, but to route incoming messages correctly.

Additional context

Solved the temporary issue by executing this code snippet:
lists:map(fun(B) -> rabbit_db_topic_exchange:set(B) end, lists:filter(fun({binding, {resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}, _, {resource,<<"eve-live">>,queue,<<"esg-eve-requests-eve-cosmetic-market-esgc_erh">>}, _}) -> true; (_) -> false end, rabbit_db_binding:get_all_for_source({resource,<<"eve-live">>,exchange,<<"eve-v3-requests">>}))).
This reconstructed the trie records from the bindings stored and solved the issue we were having.

However we're no closer to understanding why did this happen in the first place (apart form this likely being an mnesia synchronization error).

Some of our services still rely on classic mirrored queues, so currently we can't update to 4.x and use khepri just yet.

Answered by michaelklishin

Feb 12, 2025

I have added a note to the 4.0 release notes that explain the fundamental problem, how Khepri fundamentally addresses it, and what those who are still on Mnesia or 3.x versions can do to work around it.

View full answer

michaelklishin · 2025-01-07T13:25:10Z

michaelklishin
Jan 7, 2025
Maintainer

@ccp-sandor RabbitMQ 3.13.7 is out of community support. Our team won't spend any meaningful amount of time on Mnesia-related issues going forward, at least for non-paying users. Mnesia will be removed entirely in a future version of RabbitMQ.

Binding inconsistency of that kind is a very well known problem with transient and semi-transient bindings (when the exchange is durable but the queue is transient). Transient queues won't exist in a future version of RabbitMQ, and those who use only durable entities won't run into this class of problems.

Upgrade to 4.0.5 and switch to Khepri.

0 replies

michaelklishin · 2025-01-07T13:34:53Z

michaelklishin
Jan 7, 2025
Maintainer

Here is a somewhat similar problem that was immediately resolved by moving to Khepri #12927. The only difference is how it manifests itself:

When a node that hosts transient queues is stopped, all of the transient queues are deleted, and so are its bindings
If there are many bindings (or such queues), with Mnesia it results in very high metadata store contention and N transactions performed because one massive transaction contents even more (as expected)
Because N transactions are used, if the same node comes back and redeclares the same queues and binds them, you will end up with overlapping transactions and potentially a subset of bindings not being recovered (or rather, being deleted concurrently with their redeclaration).

Hopefully this explains why durable queues (and exchanges, and their bindings) won't be affected: when a node hosting a durable queue is restarted, durable queue won't be deleted, and thus neither will be their bindings.

Khepri uses a completely different data model and won't exhibit the same behavior.

0 replies

michaelklishin · 2025-02-12T17:39:46Z

michaelklishin
Feb 12, 2025
Maintainer

I have added a note to the 4.0 release notes that explain the fundamental problem, how Khepri fundamentally addresses it, and what those who are still on Mnesia or 3.x versions can do to work around it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mnesia: binding tables can become inconsistent when a node hosting transient queues is restarted #13030

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Mnesia: binding tables can become inconsistent when a node hosting transient queues is restarted #13030

Uh oh!

Uh oh!

ccp-sandor Jan 7, 2025

Describe the bug

Reproduction steps

Expected behavior

Additional context

Replies: 3 comments

Uh oh!

Uh oh!

michaelklishin Jan 7, 2025 Maintainer

Uh oh!

Uh oh!

michaelklishin Jan 7, 2025 Maintainer

Uh oh!

michaelklishin Feb 12, 2025 Maintainer

ccp-sandor
Jan 7, 2025

michaelklishin
Jan 7, 2025
Maintainer

michaelklishin
Jan 7, 2025
Maintainer

michaelklishin
Feb 12, 2025
Maintainer