-
Notifications
You must be signed in to change notification settings - Fork 41
601 sentinel isn't working #27
Comments
I can confirm several users reported newest sentinels "freezing". I did experience that myself. This sentinel regularly ends up stuck with logs look like: frozen block not moving anymore (the managed verifiers, however, do work and track blocks) |
Update with more tests, since this is happening again and again, including with the newest version. Context:
The issue:
Looking at the logs, they are always the same kind, be it when stuck or after a restart and stuck again.
I first thought of a resource issue, since the main failing sentinel was on an HDD machine. So, this seems to be pointing to something verifier related. At this stage, I'd look at the network/message level, see whether the block-with-votes response messages could be truncated or such. What more can I do to help troubleshoot? |
I add a comment on the same issue with my experience in case it helps: I couldn't get my sentinels to start protecting verifiers so I wiped the VPS of a sentinel clean (new install of ubuntu 18.04) and reinstalled the sentinel completely. Everything started off fine until the sentinel stopped tracking after a short while (see below) Now I'm back with 10s in the web listener in the line of managed verifiers that are tracking the blockchain and 0s for the others. Additional info: storing new vote, height=9536524, hash=fc05...50b8 neither the sentinel nor the verifiers tracking the blockchain have a full disk (resp. 3%, 63%, 72% usage) and they seem to have plenty of cpu and ram capacity available. I will try to downgrade the sentinel version to see if it works. |
Numerous reports of sentinels not working keep coming on Discord. The sentinel works for a bit and then randomly freezes on a specific block and stops protecting verifiers. Block delete and a restart fixes it temporarily until it gets stuck again after a few hours. I can confirm that the issue is happening on the latest version (607). |
I have tested all sentinel versions from 602 to 607, and all exhibit the same issue. It happens even when sentinels are managing in-cycle verifiers only. A fix would be greatly appreciated. |
I'm now running version 587 of the sentinel. It doesn't solve the problem by any means but seems to at least give me several days of uptime at a time. The behavior is quite strange: |
@MyAltcoins @NeoDimi |
Update: After days of debug and logs, I was able to make significant progress and will propose a fix very soon. |
This issue may be fixed in version 608, if the issue that others are seeing is the same issue we have seen. Our experience with Nyzo is actually more limited than many others in the cycle. With the joining of Argo 2080, we have only 10 in-cycle verifiers, and we have a total of 5 sentinels (including quark.nyzo.co). We did not experience this problem with any of our sentinels until doing some testing on an early version of 608 a few months ago. That doesn't mean this hasn't been a big problem -- we just haven't been able to experience the problem the way others have because of our limited setup. We've seen a total of 3 times that a verifier became inactive. All 3 times, an invalid block was frozen. And the invalid blocks all had the same qualities in common:
After adding the signature check to the in-development 608 code, we added logging to see when a block with an invalid signature was received. This happened one time after adding the logging: an invalid block was received on quark.nyzo.co from Argo 752 for height 11509752. That block was verified by verifier 86eb...1898 (nickname A1200O4898). The account for 86ec...1898 had been emptied at block 11448867. Due to the signature check, the block was discarded, fetched again, and the sentinel continued without interruption. The previous instances of stalls happened at blocks 11373452, 11438953, 11455852. If you look at these, you'll see the same pattern. They all contain meta transactions, and the accounts of the verifiers had been emptied recently before the blocks were verified. So, it makes sense to us that there's some protection causing the transactions to be removed from blocks because the accounts are mistaken to be empty. We added additional logging to try to figure out where the transactions were being stripped -- on the sender or in reassembly on the sentinel -- but we stopped receiving invalid blocks so we were unable to track further. We could have easily tested by deliberately producing invalid blocks, but this would have only further confirmed that the signature checks were performing as intended. It would not have gotten us any closer to understanding why invalid blocks were being received. The lack of sentinel signature checks could have also been exploited maliciously, although not very effectively. Again, we're not sure if we are even seeing the same issue that others are seeing, so this may not be helpful at all. If you are having problems with a particular sentinel, however, upgrading that sentinel to 608 might be worth considering. |
Coherent with what I traced down myself, on sentinels that were stuck very frequently This explains more precisely why some blocks may be invalid. I also believe the fix in 608 solves the issue indirectly, by denying blocks with missing txs to be frozen. |
Can confirm that EggPool changes completely fixed sentinel issues. |
I'm updating one of my sentinel which often needed restarts with v608. I'll give a feedback in a few days. Thx for the fix! |
It's been almost three weeks I didn't have to restart once a sentinel using this version. Kudos and thx ! |
Numerous people reported that latest sentinel versions aren't working properly, which could be the cause of recent cycle dropouts.
I have just verified this myself on a fresh sentinel installation I created yesterday, sentinel gets stuck on a certain block and doesn't move. Downgrading to v587 fixes the issue.
Here is a log:
froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% waiting for message queue to clear from thread [Thread-8], size is 1 requesting block with votes for height 8947804 [1599462571.662 (2020-09-07 07:09:31.662 UTC)]: trying to fetch BlockWithVotesRequest37 from e168...3ea5 froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% [1599462571.751 (2020-09-07 07:09:31.751 UTC)]: block-with-votes response is [BlockWithVotesResponse(block=[Block: v=2, height=8947804, hash=bf18...b752, id=a132...726e], votes=0)] froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% requesting block with votes for height 8947804 [1599462574.716 (2020-09-07 07:09:34.716 UTC)]: trying to fetch BlockWithVotesRequest37 from acc0...f75a [1599462574.861 (2020-09-07 07:09:34.861 UTC)]: block-with-votes response is [BlockWithVotesResponse(block=[Block: v=2, height=8947804, hash=bf18...b752, id=a132...726e], votes=0)] froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0% froze block [Block: v=2, height=8947803, hash=7d87...dace, id=5228...5e93], efficiency: 11.0%
The text was updated successfully, but these errors were encountered: