aleph-node logs failed connections to a bootnode that is not on the BOOT_NODES list #4

QNODE-CA · 2023-04-03T16:03:36Z

Did you read the documentation and guides?

I have inspected the documentation.

Is there an existing issue?

I have searched the existing issues.

Description of the problem

my testnet node 1 (QNODE-TEST) started logging some events yesterday (Apr-2-2023), starting at 2023-04-02 13:13:07 PDT. Events continued until the node was restarted at 2023-04-02 14:06:08 PDT. a pair of the same event was logged every 5s or so:

2023-04-02 13:13:07 💔 The bootnode you want to connect to at /ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L`

I posted about this on the discord testnet-validator-general channel: https://discord.com/channels/775720634848116807/1050508965647233055/1092203956324814848

Another validator (Bart | StakeSafe) confirmed that the IP address logged (84.245.33.122) belonged to his ISP (Cambrium IT Services BV). All of the AZF bootnodes seem to be hosted on Amazon, so there was a mismatch of sort

the connection attempted was using one of my "regular" peers IP address along with a peer_id belonging to an actual AZF bootnode (the first on the bootnode list): bootnode-eu-central-1-0.test.azero.dev

bootnode-eu-central-1-0.test.azero.dev resolved to the correct IP address (18.184.79.200) on the same testnet machine
the problem persisted until i restarted aleph-node. after restarting, the issue disappeared

Information on your setup.

testnet validator ✌️ version 0.10.1-9c6a46952d3 (built from source)
node name: QNODE-TEST
peer_id: 12D3KooWMW86EoNYwBar6e5Gcnmaa13N6JsnzYVPPVGsmvjUddWt
OS: ubuntu 22.04.02
hardware: VM on ESXi hypervisor, AMD AMD EPYC 7302 (8c/16t dedicated), 64GB ECC RAM, 1×100 GB SSD, 1×1.92 TB SSD NVMe, 1G connectivity, business fibre (static IP)
IP address: 206.116.52.219

Steps to reproduce

not sure how to reproduce

backup-aleph-node-testnet-2023-04-02T1406.zip

Did you attach relevant logs?

I have attached logs (if relevant).

The text was updated successfully, but these errors were encountered:

Marcin-Radecki · 2023-04-04T12:34:37Z

Hi, anything particular happened with network from your validator perspective around 2023-04-02 13:07:13? It seems that your node was not communicating other peers (no bytes sent/received), which fixed itself 15 minutes later:

2023-04-02 13:07:12 ✨ Imported #27490656 (0xb6f4…46e7)
2023-04-02 13:07:13 Clique Network status: not maintaining any connections;
2023-04-02 13:07:13 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:14 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 15.1kiB/s ⬆ 8.4kiB/s
2023-04-02 13:07:19 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:24 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.8kiB/s
2023-04-02 13:07:29 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.5kiB/s
2023-04-02 13:07:31 Justification requester status report: tries - 4; requested block number - 27490655; hash - 0x8004…9965;tries - 4; requested 1 children of finalized block 0x4bc1…6c85;
2023-04-02 13:07:33 Clique Network status: not maintaining any connections;
2023-04-02 13:07:33 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:34 💤 Idle (26 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:39 💤 Idle (25 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 1.9kiB/s

QNODE-CA · 2023-04-04T15:57:31Z

hmm looks like a brief network outage, let me have a look :)

QNODE-CA · 2023-04-04T16:48:44Z

ok, good find! Looks like I can reproduce this by disconnecting the network for a few seconds:

on the same node, nic disconnected at 2023-04-04 09:31:16

2023-04-04 09:31:14 💤 Idle (40 peers), best: #27649905 (0xa896…d21d), finalized #27649901 (0x62ad…198b), ⬇ 107.8kiB/s ⬆ 96.6kiB/s 2023-04-04 09:31:15 ✨ Imported #27649906 (0x6beb…df36) 2023-04-04 09:31:16 ✨ Imported #27649907 (0x5e30…9559) 2023-04-04 09:31:16 📪 No longer listening on /ip4/10.0.50.60/tcp/30334

nic reconnected at 2023-04-04 09:32:19

2023-04-04 09:32:14 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:19 Justification requester status report: tries - 15; requested block number - 27649906; hash - 0x6beb…df36;tries - 16; requested 1 children of finalized block 0x62ad…198b; 2023-04-04 09:32:19 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:21 💔 The bootnode you want to connect to at /ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L. 2023-04-04 09:32:22 ✨ Imported #27649972 (0xdfba…c3ca)

still not sure why node keeps picking on poor 84.245.33.122 (at least it's consistent) and match it with a peer_id from the BOOT_NODES list

azero-testnet-1 libp2p bootnode bug.zip

QNODE-CA · 2023-04-07T01:26:30Z

this happened again on a different testnet node (QNODE-TEST-2). This node in located in a DC, and has suffered no loss of connectivity, AFAICT. 40 peers solid throughout. Also, it self-corrected in about half a minute. log attached

azero-testnet-2 - 2023-04-06 112056 - odd bootnode IP-peer_id.zip

Marcin-Radecki · 2023-04-11T06:42:06Z

Hi, thanks for the reports. As per original logs, it does not seem to be issue affecting the chain. As per last log provided, There is one moment number of peers is 39, then it becomes 40 again, so I think there's some network activity done in the background. It's hard to tell at this moment where the issue is as it needs local reproduction to with some log levels set etc.

Marcin-Radecki · 2023-06-27T11:28:04Z

Hi, please let me know if this issue can be closed, any more reproduction so far?

QNODE-CA · 2023-06-27T15:18:36Z

Hi, I see this happening occasionally (every couple of weeks or so) on my mainnet node. Last instance was just a couple of days ago:

2023-06-25 03:58:12 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1 than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.

I did not see any of this on any of my testnet nodes, since upgrading to ver. 0.11.x

QNODE-CA · 2023-07-09T16:21:09Z

mainnet node logged another couple of these:

2023-07-08 03:12:34 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1 than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.

I can see no similar entries for testnet nodes. Hopefully this is affecting ver. 0.10.x only.

QNODE-CA · 2023-07-19T05:14:44Z

today my testnet-2 node also experienced the same issue:

2023-07-18 11:24:32 💔 The bootnode you want to connect to at /dns4/bootnode-us-east-2-0.test.azero.dev/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L provided a different peer ID 12D3KooWD5s2dkifJua69RbLwEREDdJjsNHvavNRGxdCvzhoeaLc than the one you expect 12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L.

I guess ver. 0.11.3 is not immune from this after all.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aleph-node logs failed connections to a bootnode that is not on the BOOT_NODES list #4

aleph-node logs failed connections to a bootnode that is not on the BOOT_NODES list #4

QNODE-CA commented Apr 3, 2023

Marcin-Radecki commented Apr 4, 2023

QNODE-CA commented Apr 4, 2023

QNODE-CA commented Apr 4, 2023 •

edited

Loading

QNODE-CA commented Apr 7, 2023

Marcin-Radecki commented Apr 11, 2023

Marcin-Radecki commented Jun 27, 2023

QNODE-CA commented Jun 27, 2023

QNODE-CA commented Jul 9, 2023

QNODE-CA commented Jul 19, 2023

aleph-node logs failed connections to a bootnode that is *not* on the BOOT_NODES list #4

aleph-node logs failed connections to a bootnode that is *not* on the BOOT_NODES list #4

Comments

QNODE-CA commented Apr 3, 2023

Did you read the documentation and guides?

Is there an existing issue?

Description of the problem

Information on your setup.

Steps to reproduce

Did you attach relevant logs?

Marcin-Radecki commented Apr 4, 2023

QNODE-CA commented Apr 4, 2023

QNODE-CA commented Apr 4, 2023 • edited Loading

QNODE-CA commented Apr 7, 2023

Marcin-Radecki commented Apr 11, 2023

Marcin-Radecki commented Jun 27, 2023

QNODE-CA commented Jun 27, 2023

QNODE-CA commented Jul 9, 2023

QNODE-CA commented Jul 19, 2023

aleph-node logs failed connections to a bootnode that is not on the BOOT_NODES list #4

aleph-node logs failed connections to a bootnode that is not on the BOOT_NODES list #4

QNODE-CA commented Apr 4, 2023 •

edited

Loading