Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aleph-node logs failed connections to a bootnode that is *not* on the BOOT_NODES list #4

Open
3 tasks done
QNODE-CA opened this issue Apr 3, 2023 · 9 comments
Open
3 tasks done

Comments

@QNODE-CA
Copy link

QNODE-CA commented Apr 3, 2023

Did you read the documentation and guides?

  • I have inspected the documentation.

Is there an existing issue?

  • I have searched the existing issues.

Description of the problem

my testnet node 1 (QNODE-TEST) started logging some events yesterday (Apr-2-2023), starting at 2023-04-02 13:13:07 PDT. Events continued until the node was restarted at 2023-04-02 14:06:08 PDT. a pair of the same event was logged every 5s or so:

2023-04-02 13:13:07 💔 The bootnode you want to connect to at /ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L`

I posted about this on the discord testnet-validator-general channel: https://discord.com/channels/775720634848116807/1050508965647233055/1092203956324814848

Another validator (Bart | StakeSafe) confirmed that the IP address logged (84.245.33.122) belonged to his ISP (Cambrium IT Services BV). All of the AZF bootnodes seem to be hosted on Amazon, so there was a mismatch of sort

the connection attempted was using one of my "regular" peers IP address along with a peer_id belonging to an actual AZF bootnode (the first on the bootnode list): bootnode-eu-central-1-0.test.azero.dev

bootnode-eu-central-1-0.test.azero.dev resolved to the correct IP address (18.184.79.200) on the same testnet machine
the problem persisted until i restarted aleph-node. after restarting, the issue disappeared

Information on your setup.

testnet validator ✌️ version 0.10.1-9c6a46952d3 (built from source)
node name: QNODE-TEST
peer_id: 12D3KooWMW86EoNYwBar6e5Gcnmaa13N6JsnzYVPPVGsmvjUddWt
OS: ubuntu 22.04.02
hardware: VM on ESXi hypervisor, AMD AMD EPYC 7302 (8c/16t dedicated), 64GB ECC RAM, 1×100 GB SSD, 1×1.92 TB SSD NVMe, 1G connectivity, business fibre (static IP)
IP address: 206.116.52.219

Steps to reproduce

not sure how to reproduce

backup-aleph-node-testnet-2023-04-02T1406.zip

Did you attach relevant logs?

  • I have attached logs (if relevant).
@Marcin-Radecki
Copy link

Hi, anything particular happened with network from your validator perspective around 2023-04-02 13:07:13? It seems that your node was not communicating other peers (no bytes sent/received), which fixed itself 15 minutes later:

2023-04-02 13:07:12 ✨ Imported #27490656 (0xb6f4…46e7)
2023-04-02 13:07:13 Clique Network status: not maintaining any connections;
2023-04-02 13:07:13 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:14 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 15.1kiB/s ⬆ 8.4kiB/s
2023-04-02 13:07:19 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:24 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.8kiB/s
2023-04-02 13:07:29 💤 Idle (33 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.5kiB/s
2023-04-02 13:07:31 Justification requester status report: tries - 4; requested block number - 27490655; hash - 0x8004…9965;tries - 4; requested 1 children of finalized block 0x4bc1…6c85;
2023-04-02 13:07:33 Clique Network status: not maintaining any connections;
2023-04-02 13:07:33 Network status report: authentication connected peers - 99; block sync connected peers - 97;
2023-04-02 13:07:34 💤 Idle (26 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 0.9kiB/s
2023-04-02 13:07:39 💤 Idle (25 peers), best: #27490656 (0xb6f4…46e7), finalized #27490653 (0x4bc1…6c85), ⬇ 0 ⬆ 1.9kiB/s

@QNODE-CA
Copy link
Author

QNODE-CA commented Apr 4, 2023

hmm looks like a brief network outage, let me have a look :)

@QNODE-CA
Copy link
Author

QNODE-CA commented Apr 4, 2023

ok, good find! Looks like I can reproduce this by disconnecting the network for a few seconds:

on the same node, nic disconnected at 2023-04-04 09:31:16

2023-04-04 09:31:14 💤 Idle (40 peers), best: #27649905 (0xa896…d21d), finalized #27649901 (0x62ad…198b), ⬇ 107.8kiB/s ⬆ 96.6kiB/s 2023-04-04 09:31:15 ✨ Imported #27649906 (0x6beb…df36) 2023-04-04 09:31:16 ✨ Imported #27649907 (0x5e30…9559) 2023-04-04 09:31:16 📪 No longer listening on /ip4/10.0.50.60/tcp/30334

nic reconnected at 2023-04-04 09:32:19

2023-04-04 09:32:14 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:19 Justification requester status report: tries - 15; requested block number - 27649906; hash - 0x6beb…df36;tries - 16; requested 1 children of finalized block 0x62ad…198b; 2023-04-04 09:32:19 💤 Idle (0 peers), best: #27649907 (0x5e30…9559), finalized #27649901 (0x62ad…198b), ⬇ 0 ⬆ 0 2023-04-04 09:32:21 💔 The bootnode you want to connect to at /ip4/84.245.33.122/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1Lprovided a different peer ID12D3KooWJgniH6kTFxLAZUVaP2nE7ohrMVjFMXBbbyph1uGYFDL6than the one you expect12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L. 2023-04-04 09:32:22 ✨ Imported #27649972 (0xdfba…c3ca)

still not sure why node keeps picking on poor 84.245.33.122 (at least it's consistent) and match it with a peer_id from the BOOT_NODES list

azero-testnet-1 libp2p bootnode bug.zip

@QNODE-CA
Copy link
Author

QNODE-CA commented Apr 7, 2023

this happened again on a different testnet node (QNODE-TEST-2). This node in located in a DC, and has suffered no loss of connectivity, AFAICT. 40 peers solid throughout. Also, it self-corrected in about half a minute. log attached

azero-testnet-2 - 2023-04-06 112056 - odd bootnode IP-peer_id.zip

@Marcin-Radecki
Copy link

Hi, thanks for the reports. As per original logs, it does not seem to be issue affecting the chain. As per last log provided, There is one moment number of peers is 39, then it becomes 40 again, so I think there's some network activity done in the background. It's hard to tell at this moment where the issue is as it needs local reproduction to with some log levels set etc.

@Marcin-Radecki
Copy link

Hi, please let me know if this issue can be closed, any more reproduction so far?

@QNODE-CA
Copy link
Author

Hi, I see this happening occasionally (every couple of weeks or so) on my mainnet node. Last instance was just a couple of days ago:

2023-06-25 03:58:12 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1 than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.

I did not see any of this on any of my testnet nodes, since upgrading to ver. 0.11.x

@QNODE-CA
Copy link
Author

QNODE-CA commented Jul 9, 2023

mainnet node logged another couple of these:

2023-07-08 03:12:34 💔 The bootnode you want to connect to at /ip4/37.75.153.26/tcp/30333/p2p/12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB provided a different peer ID 12D3KooWSWzDVGzH9UEeQajj1vgWTC1k4Kg31vX65bWiCXJcERK1 than the one you expect 12D3KooWJnEGVVmnXhVNxV6KWe3EsDPNvPFNcYbQ6amFVGECVAGB.

I can see no similar entries for testnet nodes. Hopefully this is affecting ver. 0.10.x only.

@QNODE-CA
Copy link
Author

today my testnet-2 node also experienced the same issue:

2023-07-18 11:24:32 💔 The bootnode you want to connect to at /dns4/bootnode-us-east-2-0.test.azero.dev/tcp/30333/p2p/12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L provided a different peer ID 12D3KooWD5s2dkifJua69RbLwEREDdJjsNHvavNRGxdCvzhoeaLc than the one you expect 12D3KooWRkGLz4YbVmrsWK75VjFTs8NvaBu42xhAmQaP4KeJpw1L.

I guess ver. 0.11.3 is not immune from this after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants