[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

kallerosenbaum · 2024-12-12T13:20:41Z

Background

We run two LND nodes in kubernetes, and after restarting the backing Bitcoin Core node, we notice that LND falls out of sync with the blockchain.

This happens because, in our kubernetes environment, the IP address of Bitcoin Core changes when it is restarted. synced_to_chain will become false and no new blocks will be received.

Your environment

version of lnd: v0.18.2-beta
which operating system (uname -a on *Nix):
Linux lnd-routing-0 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 17:10:38 UTC 2024 aarch64 Linux
and Linux 9db991b293cb 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 Linux
version of btcd, bitcoind, or other backend: Bitcon Core 27.0
any other relevant environment details: We run our stack in kubernetes

Steps to reproduce

I'll show how I reproduce it in regtest, but we get the same issue in production (running in kubernetes) too.

We run LND with the following config in docker-compose:

        --listen=0.0.0.0:9735
        --externalip=lnd-0
        --rpclisten=0.0.0.0:10009
        --bitcoin.active
        --bitcoin.node=bitcoind
        --bitcoin.regtest
        --bitcoind.rpcuser=test
        --bitcoind.rpcpass=password
        --bitcoind.rpchost=bitcoin:18443
        --bitcoind.zmqpubrawblock=tcp://bitcoin:18501
        --bitcoind.zmqpubrawtx=tcp://bitcoin:18502
        --norest
        --protocol.wumbo-channels

When running this, bitcoin resolves to 172.18.0.2.

Build some blocks and make sure LND is in sync by running lncli -network=regtest getinfo and check that synced_to_chain is true.
Stop bitcoin core, and restart it again, but this time make sure it gets a new IP address, so from now on bitcoin resolves to e.g. 172.18.0.6.
Build a block
Run lncli -network=regtest getinfo. synced_to_chain will be false, but block_height and block_hash will be the most recent one.

After this, LND will not receive any new blocks, but it has apparently reconnected (presumably through RPC) to get the latest block hash. My guess is that ZMQ stops working due to the IP address change.

Expected behaviour

After reconnecting to the node it should eventually show "synced_to_chain": true. Alternatively (it it's a ZMQ connection issue) I'd expect LND to scream pretty loudly in the log.

Actual behaviour

"synced_to_chain": false indefinitely and we see no new logs of type

[INF] NTFN: New block: height=873198, sha=000000000000000000007b48042479e4f07ce2d6ae9a79c2a3ef5223dc78dd5c

The text was updated successfully, but these errors were encountered:

Roasbeef · 2024-12-12T13:51:11Z

Are you running with the health check system on? It's meant to catch failures like this, then cause a restart of lnd. It seems like you expect that lnd will resolve the bitcoind host again automatically, but atm we do the resolution once, then use the IP from there on.

Here're the health check params I'm referring to:

; The number of times we should attempt to query our chain backend before
; gracefully shutting down. Set this value to 0 to disable this health check.
; healthcheck.chainbackend.attempts=3

; The amount of time we allow a call to our chain backend to take before we fail
; the attempt. This value must be >= 1s.
; healthcheck.chainbackend.timeout=30s

; The amount of time we should backoff between failed attempts to query chain
; backend. This value must be >= 1s.
; healthcheck.chainbackend.backoff=2m

; The amount of time we should wait between chain backend health checks. This
; value must be >= 1m.
; healthcheck.chainbackend.interval=1m

kallerosenbaum · 2024-12-12T15:34:49Z

@Roasbeef yes, it's on, and in production we've set

--healthcheck.chainbackend.attempts=30

And we see the following from healthcheck after restart:


2024-12-04 09:55:59.568 [INF] HLCK: Health check: chain backend, call: 1 failed with: invalid http POST response (nil), method: uptime, id: 1215, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 09:58:22.107 [INF] HLCK: Health check: chain backend, call: 2 failed with: invalid http POST response (nil), method: uptime, id: 1216, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 10:00:44.648 [INF] HLCK: Health check: chain backend, call: 3 failed with: invalid http POST response (nil), method: uptime, id: 1217, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s

Then it succeeds to connect to the RPC port (in spite of IP address change). So at least RPC can handle an IP address change. My guess is that it's the ZMQ connection that stops working, and the health check doesn't verify that connection. So health check doesn't help here.

kallerosenbaum added bug Unintended code behaviour needs triage labels Dec 12, 2024

saubyk added this to the 0.20.0 milestone Dec 19, 2024

saubyk added P1 MUST be fixed or reviewed P2 should be fixed if one has time and removed needs triage P1 MUST be fixed or reviewed labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

kallerosenbaum commented Dec 12, 2024 •

edited

Loading

Roasbeef commented Dec 12, 2024 •

edited

Loading

kallerosenbaum commented Dec 12, 2024

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

[bug]: LND falls out of sync when Bitcoin Core's IP address changes #9353

Comments

kallerosenbaum commented Dec 12, 2024 • edited Loading

Background

Your environment

Steps to reproduce

Expected behaviour

Actual behaviour

Roasbeef commented Dec 12, 2024 • edited Loading

kallerosenbaum commented Dec 12, 2024

kallerosenbaum commented Dec 12, 2024 •

edited

Loading

Roasbeef commented Dec 12, 2024 •

edited

Loading