You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run two LND nodes in kubernetes, and after restarting the backing Bitcoin Core node, we notice that LND falls out of sync with the blockchain.
This happens because, in our kubernetes environment, the IP address of Bitcoin Core changes when it is restarted. synced_to_chain will become false and no new blocks will be received.
Your environment
version of lnd: v0.18.2-beta
which operating system (uname -a on *Nix): Linux lnd-routing-0 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 17:10:38 UTC 2024 aarch64 Linux
and Linux 9db991b293cb 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 Linux
version of btcd, bitcoind, or other backend: Bitcon Core 27.0
any other relevant environment details: We run our stack in kubernetes
Steps to reproduce
I'll show how I reproduce it in regtest, but we get the same issue in production (running in kubernetes) too.
We run LND with the following config in docker-compose:
When running this, bitcoin resolves to 172.18.0.2.
Build some blocks and make sure LND is in sync by running lncli -network=regtest getinfo and check that synced_to_chain is true.
Stop bitcoin core, and restart it again, but this time make sure it gets a new IP address, so from now on bitcoin resolves to e.g. 172.18.0.6.
Build a block
Run lncli -network=regtest getinfo. synced_to_chain will be false, but block_height and block_hash will be the most recent one.
After this, LND will not receive any new blocks, but it has apparently reconnected (presumably through RPC) to get the latest block hash. My guess is that ZMQ stops working due to the IP address change.
Expected behaviour
After reconnecting to the node it should eventually show "synced_to_chain": true. Alternatively (it it's a ZMQ connection issue) I'd expect LND to scream pretty loudly in the log.
Actual behaviour
"synced_to_chain": false indefinitely and we see no new logs of type
[INF] NTFN: New block: height=873198, sha=000000000000000000007b48042479e4f07ce2d6ae9a79c2a3ef5223dc78dd5c
The text was updated successfully, but these errors were encountered:
Are you running with the health check system on? It's meant to catch failures like this, then cause a restart of lnd. It seems like you expect that lnd will resolve the bitcoind host again automatically, but atm we do the resolution once, then use the IP from there on.
Here're the health check params I'm referring to:
; The number of times we should attempt to query our chain backend before
; gracefully shutting down. Set this value to 0 to disable this health check.
; healthcheck.chainbackend.attempts=3
; The amount of time we allow a call to our chain backend to take before we fail
; the attempt. This value must be >= 1s.
; healthcheck.chainbackend.timeout=30s
; The amount of time we should backoff between failed attempts to query chain
; backend. This value must be >= 1s.
; healthcheck.chainbackend.backoff=2m
; The amount of time we should wait between chain backend health checks. This
; value must be >= 1m.
; healthcheck.chainbackend.interval=1m
@Roasbeef yes, it's on, and in production we've set
--healthcheck.chainbackend.attempts=30
And we see the following from healthcheck after restart:
2024-12-04 09:55:59.568 [INF] HLCK: Health check: chain backend, call: 1 failed with: invalid http POST response (nil), method: uptime, id: 1215, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 09:58:22.107 [INF] HLCK: Health check: chain backend, call: 2 failed with: invalid http POST response (nil), method: uptime, id: 1216, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
2024-12-04 10:00:44.648 [INF] HLCK: Health check: chain backend, call: 3 failed with: invalid http POST response (nil), method: uptime, id: 1217, last error=Post "http://bitcoin-0.bitcoin.crypto.svc.cluster.local:8332": dial tcp: lookup bitcoin-0.bitcoin.crypto.svc.cluster.local on 169.254.20.10:53: no such host, backing off for: 2m0s
Then it succeeds to connect to the RPC port (in spite of IP address change). So at least RPC can handle an IP address change. My guess is that it's the ZMQ connection that stops working, and the health check doesn't verify that connection. So health check doesn't help here.
Background
We run two LND nodes in kubernetes, and after restarting the backing Bitcoin Core node, we notice that LND falls out of sync with the blockchain.
This happens because, in our kubernetes environment, the IP address of Bitcoin Core changes when it is restarted.
synced_to_chain
will becomefalse
and no new blocks will be received.Your environment
lnd
: v0.18.2-betauname -a
on *Nix):Linux lnd-routing-0 6.8.0-1018-aws #19~22.04.1-Ubuntu SMP Wed Oct 9 17:10:38 UTC 2024 aarch64 Linux
and
Linux 9db991b293cb 6.1.0-26-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.112-1 (2024-09-30) x86_64 Linux
btcd
,bitcoind
, or other backend: Bitcon Core 27.0Steps to reproduce
I'll show how I reproduce it in regtest, but we get the same issue in production (running in kubernetes) too.
When running this,
bitcoin
resolves to172.18.0.2
.lncli -network=regtest getinfo
and check thatsynced_to_chain
istrue
.bitcoin
resolves to e.g.172.18.0.6
.lncli -network=regtest getinfo
.synced_to_chain
will befalse
, butblock_height
andblock_hash
will be the most recent one.After this, LND will not receive any new blocks, but it has apparently reconnected (presumably through RPC) to get the latest block hash. My guess is that ZMQ stops working due to the IP address change.
Expected behaviour
After reconnecting to the node it should eventually show
"synced_to_chain": true
. Alternatively (it it's a ZMQ connection issue) I'd expect LND to scream pretty loudly in the log.Actual behaviour
"synced_to_chain": false
indefinitely and we see no new logs of typeThe text was updated successfully, but these errors were encountered: