Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CKB v114.0 Full node sync is stopped. #4462

Open
silySuper opened this issue May 15, 2024 · 45 comments
Open

CKB v114.0 Full node sync is stopped. #4462

silySuper opened this issue May 15, 2024 · 45 comments
Assignees
Labels
stale To be closed due to a lack of activity t:bug Type: This doesn't seem right.

Comments

@silySuper
Copy link

Bug Report

Full node sync is stopped.

Current Behavior

截屏2024-05-15 10 07 34 [logs 2.zip](https://github.com/nervosnetwork/ckb/files/15315402/logs.2.zip) 截屏2024-05-15 10 09 17

Environment

  • CKB version: v114.0
  • Chain: testnet

testnet/data has been replaced by https://download.magickbase.com/backup_20240513.tar.gz

@silySuper silySuper added the t:bug Type: This doesn't seem right. label May 15, 2024
@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

(The link you provided, https://download.magickbase.com/backup_20240513.tar.gz is returning a 404 error.)

When you say, "Full node is stopped," do you specifically mean that the CKB has been synchronizing for a long time but the block height hasn't increased? How long has it been syncing?

Could you provide the output of:

curl -X POST 127.0.0.1:8114 -H 'Content-Type: application/json' -d '{ "id": 42, "jsonrpc": "2.0", "method": "sync_state", "params": [ ] }'

and

curl -X POST 127.0.0.1:8114 -H 'Content-Type: application/json' -d '{ "id": 42, "jsonrpc": "2.0", "method": "get_peers", "params": [ ] }'

@eval-exec eval-exec self-assigned this May 15, 2024
@silySuper
Copy link
Author

It has been sync from yesterday afternoon to now.
截屏2024-05-15 10 50 51
截屏2024-05-15 10 51 09

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

The get_peers RPC returns an empty result, indicating that the CKB node isn't maintaining network connections with other peers, hence it's unable to synchronize blocks.

  1. Have you made any changes to the default ckb.toml file? If so, how did you modify it? Did you edit the configuration related to the white-list in ckb.toml?
  2. Can you share the complete log file (./data/logs/run.log)?

@silySuper
Copy link
Author

silySuper commented May 15, 2024

run.log

I do not change ckb.toml.This part of ckb.toml.

### Whitelist-only mode
# whitelist_only = false
### Whitelist peers connecting from the given IP addresses
# whitelist_peers = []
### Enable `SO_REUSEPORT` feature to reuse port on Linux, not supported on other OS yet
# reuse_port_on_linux = true

max_peers = 125
max_outbound_peers = 8
# 2 minutes
ping_interval_secs = 120
# 20 minutes
ping_timeout_secs = 1200
connect_outbound_interval_secs = 15
# If set to true, try to register upnp
upnp = false
# If set to true, network service will add discovered local address to peer store, it's helpful for private net development
discovery_local_address = false
# If set to true, random cleanup when there are too many inbound nodes
# Ensure that itself can continue to serve as a bootnode node
bootnode_mode = false

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

I found there are some ERROR in run.log:

2024-05-14 20:18:58.451 +00:00 log flusher ERROR sled::flusher  failed to fsync from periodic flush thread: Input/output error (os error 5)
  1. What happened in 2024-05-14 20:18:58.451?
  2. Could you provide data/network/peer_store/addr_manager.db and data/network/ban_list.db files?
  3. Could you provide:
curl -X POST 127.0.0.1:8114 -H 'Content-Type: application/json' -d '{ "id": 42, "jsonrpc": "2.0", "method": "get_banned_addresses", "params": [ ] }'

and

curl -X POST 127.0.0.1:8114 -H 'Content-Type: application/json' -d '{ "id": 42, "jsonrpc": "2.0", "method": "get_tip_header", "params": [ ] }'

@silySuper
Copy link
Author

1.At 2024-05-14 20:18:58.451 ,computer is sleeping(v114.0 is in my hard disk).
2.
addr_manager.db.zip
ban_list.db.zip

截屏2024-05-15 14 07 12 截屏2024-05-15 14 11 55

@silySuper
Copy link
Author

截屏2024-05-15 14 40 28 now it shows error in ckb server.

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

Is your hard drive malfunctioning?
Could you change [loggger].filter to "debug" in ckb.toml, then restart ckb node, then provide the log file?

@silySuper
Copy link
Author

My hard drive does not throw error before,I will find a tool to check whether it is malfunctioning
logs.zip

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

I suspect there might be an issue with the [network] configuration in your config file.

What's the configuration for support_protocols in your ckb.toml file?

Could you share the complete configuration from your ckb.toml file?

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

I found your ckb process is buzy on serving an RPC:
I guess it's Indexer's get_cells or get_transactions RPC.
I observed that the "id" field in the RPC request is not consistent. Is the ckb process operating as a public node?

❯ cat logs/run.log | grep -i rpc | head
2024-05-15 07:21:02.362 +00:00 main INFO ckb_rpc::server  Listen HTTP RPCServer on address: 127.0.0.1:8114
2024-05-15 07:21:02.944 +00:00 GlobalRt-7 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0xc9ed7d","id":8362}.
2024-05-15 07:21:02.953 +00:00 GlobalRt-6 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0x10639e0895502b5688a6be8cf69460d76541bfa4821629d86d62ba0aae3f9606","id":8716}.
2024-05-15 07:21:02.964 +00:00 GlobalRt-6 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"alerts":[],"chain":"ckb_testnet","difficulty":"0x1c794aac","epoch":"0x70800e40021fd","is_initial_block_download":true,"median_time":"0x18f70071de4"},"id":2884}.
2024-05-15 07:21:02.971 +00:00 GlobalRt-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0x10639e0895502b5688a6be8cf69460d76541bfa4821629d86d62ba0aae3f9606","id":1528}.
2024-05-15 07:21:02.972 +00:00 GlobalRt-6 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"alerts":[],"chain":"ckb_testnet","difficulty":"0x1c794aac","epoch":"0x70800e40021fd","is_initial_block_download":true,"median_time":"0x18f70071de4"},"id":2529}.
2024-05-15 07:21:02.975 +00:00 GlobalRt-6 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0x10639e0895502b5688a6be8cf69460d76541bfa4821629d86d62ba0aae3f9606","id":3712}.
2024-05-15 07:21:03.926 +00:00 GlobalRt-0 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0xc9ed7d","id":804}.
2024-05-15 07:21:04.931 +00:00 GlobalRt-4 DEBUG rpc  Response: {"jsonrpc":"2.0","result":"0xc9ed7d","id":8224}.
2024-05-15 07:21:05.190 +00:00 GlobalRt-6 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"compact_target":"0x1d08fda0","dao":"0xa58aea41d2aa324d7444ff8c4625280007bc9da5479673060085b8d95c40d408","epoch":"0x70800e40021fd","extra_hash":"0x167c593d80e706b9c2e52c9a6aeebf39fdd08574c55b6deb5df128a1484677cb","hash":"0xf47a17392103d8c423089c6fb42ea3bd15cd44b5a8268c7ae18a72d517906ce2","nonce":"0xc108f9e52219af493fa13978b8cb7429","number":"0xc9ed7d","parent_hash":"0xcdc8fe751ae2731595b674b07f116ab240eefde45b4a911e702b79f58cffc441","proposals_hash":"0xf6d454599fdf28e3c73fb8e7d0d93e3f6128ad650a21b0f745cb9fba8bb0742f","timestamp":"0x18f70094a66","transactions_root":"0x9ff138b4bd1cbb83065343651537ac465209abb26455f0dd603978c6447ea6f4","version":"0x0"},"id":3662}.
/tmp/t
❯ cat logs/run.log | grep -i rpc | tail
2024-05-15 07:22:08.718 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":3306}.
2024-05-15 07:22:08.719 +00:00 GlobalRt-8 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":7302}.
2024-05-15 07:22:08.719 +00:00 GlobalRt-8 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":9305}.
2024-05-15 07:22:08.719 +00:00 GlobalRt-8 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":8167}.
2024-05-15 07:22:08.720 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":3193}.
2024-05-15 07:22:08.721 +00:00 GlobalRt-8 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":4394}.
2024-05-15 07:22:08.721 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":3780}.
2024-05-15 07:22:08.721 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":285}.
2024-05-15 07:22:08.722 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":3175}.
2024-05-15 07:22:08.723 +00:00 GlobalRt-1 DEBUG rpc  Response: {"jsonrpc":"2.0","result":{"last_cursor":"0x","objects":[]},"id":8217}.
/tmp/t
❯ cat logs/run.log | grep -i rpc | wc -l
116692

How's the CPU and IO load on the machine where the ckb process is running?

Which client is making a large number of RPC requests to the ckb node? Could you try turning off the client to see if it makes a difference?

@silySuper
Copy link
Author

截屏2024-05-15 16 51 22 I does not find any other rpc requests to the ckb node. 截屏2024-05-15 16 52 50

@eval-exec
Copy link
Collaborator

I does not find any other rpc requests to the ckb node.

How about change [network].listen_addresses to another port, and restart?

@silySuper
Copy link
Author

chang port 8115 to 8117 and restart
截屏2024-05-15 17 27 13
the same

@quake
Copy link
Member

quake commented May 15, 2024

I does not find any other rpc requests to the ckb node.

How about change [network].listen_addresses to another port, and restart?

I think it should be [rpc].listen_address

@silySuper
Copy link
Author

change from 8114 to 8115
截屏2024-05-15 18 36 20

@eval-exec
Copy link
Collaborator

eval-exec commented May 15, 2024

change from 8114 to 8115

Has the height of the ckb node increased after an hour has passed?
Is the Neuron client currently connected to this ckb node? I suspect that the Neron client is making a large number of RPC requests to the ckb node. Could you try shutting down the Neuron client and directly launching the ckb node? (Then provide the debug log

@silySuper
Copy link
Author

after change port,it always shows connecting,so I change to port 8114 again.I want to try v115 ,but it always show not safe even I clicked allowed in Privacy and security of my computer.Whether we can find a faster way to solve this?because it has blocked me for about two weeks.

@silySuper
Copy link
Author

step 1.
截屏2024-05-16 09 49 36

step 2.
截屏2024-05-16 09 49 48

step 3.

截屏2024-05-16 09 50 50

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

I'm sorry, I don't have experience with Mac. Would this link to Apple support be helpful?
Apple 无法检查 App 是否包含恶意软件


@silySuper
Copy link
Author

I operate already,it does not effect.

@eval-exec
Copy link
Collaborator

after change port,it always shows connecting

Could you try starting the ckb process after shutting down the Neoron client?

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

Whether we can find a faster way to solve this?because it has blocked me for about two weeks.

Do you know the absolute path of the ckb 0.114.0 binary file?
Could you copy the previous data/db file, initialize ckb in a new directory, and try again?

  1. Initialize ckb configurations in a new_dir
./ckb init -C new_dir --chain testnet
  1. copy the previous data/db to new_dir/data/
cp -R previous_dir/data/db new_dir/data/
  1. start the ckb
./ckb run -C new_dir

@silySuper
Copy link
Author

silySuper commented May 16, 2024

This log try starting the ckb process after shutting down the Neoron client on changed port 8115.
logs.zip

@silySuper
Copy link
Author

Whether we can find a faster way to solve this?because it has blocked me for about two weeks.

Do you know the absolute path of the ckb 0.114.0 binary file? Could you copy the previous data/db file, initialize ckb in a new directory, and try again?

  1. Initialize ckb configurations in a new_dir
./ckb init -C new_dir --chain testnet
  1. copy the previous data/db to new_dir/data/
cp -R previous_dir/data/db new_dir/data/
  1. start the ckb
./ckb run -C new_dir

Ok,I will try this

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

Don't delete anything in the previous_dir just yet, we still need to investigate the cause of the "sync stuck" issue.

If you're able to sync smoothly after initializing the ckb configuration file in a new directory and using the copied data/db from before, it might be the case that some other temporary file in previous_dir is causing an issue.

Then we can investigate further in the previous_dir to see what exactly was the problem.
First, backup the entire previous_dir: cp -R previous_dir previous_dir_backup

  1. Try moving data/network/peer_store/addr_manager.db to a different location, then start ckb with ckb run and see what happens.
  2. Try moving data/network/peer_store/ban_list.db to a different location, then start ckb with ckb run and see what happens.
  3. Try moving data/tx_pool to a different location, then start ckb with ckb run and see what happens.
  4. Try moving data/indexer to a different location, then start ckb with ckb run and see what happens.

@silySuper
Copy link
Author

ckb node works fine now,but neuron can not sync
截屏2024-05-16 10 42 42

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

but neuron can not sync

What does this mean? What message does Neuron display? Have you tried again with the --indexer argument added?

./ckb run -C new_dir --indexer

@silySuper
Copy link
Author

截屏2024-05-16 10 56 10 Ok now

@eval-exec
Copy link
Collaborator

截屏2024-05-16 10 56 10 Ok now

It appears that the Neuron client isn't connecting to the ckb process you started in new_dir. In your new_dir, the latest tip block should be higher than 13,245,789, but the Neuron client is showing "Block Synced is 24,584."

@silySuper
Copy link
Author

yes,but my port is same to ckb.toml
_ ./ckb run -C /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork --indexer

@silySuper
Copy link
Author

截屏2024-05-16 11 38 11 this is lateset node log

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

截屏2024-05-16 10 56 10 Ok now

What's Neuron's sync progress now?

@silySuper
Copy link
Author

截屏2024-05-16 11 41 55

@eval-exec
Copy link
Collaborator

截屏2024-05-16 11 41 55

Could you run the command ps -eF | grep ckb to check if there are two ckb nodes running on your local machine? I suspect the ckb node that Neuron is connecting to is not the one you're running in /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork.

@silySuper
Copy link
Author

截屏2024-05-16 11 44 40

@eval-exec
Copy link
Collaborator

eval-exec commented May 16, 2024

It appears the sync progress that Neuron displays pertains to the Indexer.

Previously, you only copied data/db from previous_dir to /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork, which doesn't include the Indexer's data.

First, stop ckb process.

Now, you can move the Indexer data that has already synced to 10.63% in /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork to another location:

mv  /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork/data/indexer  backup_indexer

Then, move data/indexer from previous_dir to /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork/data/indexer:

mv previous_dir/data/indexer /Volumes/My\ Passport/ckb_v0.114.0_aarch64-apple-darwin-portable/testnetwork/data/

Then start the ckb node and then check the sync progress in Neuron. It should now show over 90%.

@silySuper
Copy link
Author

截屏2024-05-16 12 04 37 OK now

@eval-exec
Copy link
Collaborator

Whether we can find a faster way to solve this? Because it has blocked me for about two weeks.

That's great, you can continue working on this testnet node.

If you are not busy next, shall we continue to investigate the root cause of the sync being stuck? #4462 (comment).

@silySuper
Copy link
Author

ok,no problem.

@silySuper
Copy link
Author

after run for a while ,it shows zsh: Input/output error: ./ckb
截屏2024-05-16 14 25 01

@silySuper
Copy link
Author

It is the hard disk no react that cause the problem.Because I find that hard disk is null,when I Disconnect and reconnect,it is ok

@15168316096
Copy link
Contributor

15168316096 commented May 16, 2024

Since the full node is started, the previous verification was on a Mac machine and a solid-state drive environment and no related problems were encountered. Can you provide your local system environment for starting the neuron wallet, such as the version of neuron? and macos system version and SSD information, or call meet directly to check your local environment.

@eval-exec
Copy link
Collaborator

It is the hard disk no react that cause the problem.Because I find that hard disk is null,when I Disconnect and reconnect,it is ok

Hello. Has this issue not recurred since you reconnected the hard drive? Are you able to reproduce this problem in your original environment again?

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale To be closed due to a lack of activity label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale To be closed due to a lack of activity t:bug Type: This doesn't seem right.
Projects
None yet
Development

No branches or pull requests

4 participants