Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspending system crashes lbrycrd #313

Open
nikooo777 opened this issue Sep 5, 2019 · 8 comments
Open

Suspending system crashes lbrycrd #313

nikooo777 opened this issue Sep 5, 2019 · 8 comments
Labels
hacktoberfest Welcome to Hacktoberfest level: 2 Some knowledge of the existing code is recommended

Comments

@nikooo777
Copy link

On my Kubuntu 18.04 system

[niko:~/work/repositories/ansible] master(+107/-95)+ 10m13s ± uname -a
Linux nikubuntu 4.20.17-042017-generic #201903190933 SMP Tue Mar 19 13:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I left lbrycrd running before suspending the system to RAM. After resuming I found this in the logs after realizing lbrycrd had crashed:

2019-09-04T03:08:49Z UpdateTip: new best=bef7f20382b621d936d07298ceac08288b777bf5496191c31e7d57dd94dcc332 height=627855 version=0x20000000 log2_work=73.213445 tx=5695597 date='2019-09-04T03:08:25Z' progress=0.997478 cache=592.0MiB(641103txo)
2019-09-04T12:07:41Z socket receive timeout: 32204s
2019-09-04T12:07:41Z socket receive timeout: 32210s
2019-09-04T12:07:41Z socket receive timeout: 32225s
2019-09-04T12:07:41Z socket receive timeout: 32203s
2019-09-04T12:07:41Z socket receive timeout: 32201s
2019-09-04T12:07:41Z socket receive timeout: 32201s
2019-09-04T12:07:41Z socket receive timeout: 32200s
2019-09-04T12:07:41Z socket receive timeout: 32223s
2019-09-04T12:07:41Z 
************************
EXCEPTION: N5boost10wrapexceptINS_15condition_errorEEE       
boost::condition_variable::do_wait_until failed in pthread_cond_timedwait: Invalid argument       
lbrycrd in scheduler       
************************
EXCEPTION: N5boost10wrapexceptINS_15condition_errorEEE       
boost::condition_variable::do_wait_until failed in pthread_cond_timedwait: Invalid argument       
lbrycrd in scheduler       
terminate called after throwing an instance of 'boost::wrapexcept<boost::condition_error>'
  what():  boost::condition_variable::do_wait_until failed in pthread_cond_timedwait: Invalid argument
Aborted (core dumped)

The expected behavior would be that it continues operating normally

I had to reindex the whole chain to be able to start it again.

@BrannonKing
Copy link
Member

Was this version 17.2.1? And was it an official build or a custom one?

@bvbfan
Copy link
Collaborator

bvbfan commented Sep 5, 2019

You use wireless or wired connection? On suspend network is suspended as well, when you wake up connection can be up again (if it's a kind of VPN it can take a long) say if network takes long that we have in wait, it will throw. After all on exception we should flush data as well, if we did data was corrupt.

@BrannonKing
Copy link
Member

We cannot flush the disk buffers when any arbitrary exception kills the process. The exception may have come from the disk flush itself. However, if there is a specific one that we know doesn't affect the data on disk -- we could catch that one and restart that component or shut-down cleanly.

@BrannonKing
Copy link
Member

I'm unable to reproduce this with kill -STOP/-CONT. I'm unable to reproduce it with a few random suspensions during sync. I like @bvbfan 's theory about the slow network startup time. @nikooo777 , if this is easily reproducible for you, I have some things we can try. We can try builds with a few different versions of boost compiled in. We can also run a debug build and get the core dump, so that we know what the full stack for the error is.

@nikooo777
Copy link
Author

Sorry for the delayed answer. My PC is wired so I am unsure why this would have happened. It was also the first time of me seeing this.
Is this related? looks like so: bitcoin/bitcoin#14200

@bvbfan
Copy link
Collaborator

bvbfan commented Sep 19, 2019

I've test it, at least 3 times and can't reproduce with my settings
WiFi - auto connect (priority 0), no VPN nor proxy, stored password no explicit user input.

@BrannonKing BrannonKing added level: 2 Some knowledge of the existing code is recommended hacktoberfest Welcome to Hacktoberfest labels Sep 26, 2019
@nikooo777
Copy link
Author

I had this happen to me another time a couple of weeks ago, but it's rather sporadic and probably not worth investigating further as a node will not likely suspend every day.
The issue can be closed if you agree.

@BrannonKing
Copy link
Member

BrannonKing commented Sep 7, 2020

I have a theory that this is fixed here: bitcoin/bitcoin#18284 . I'm going to bring it into the v19 build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest Welcome to Hacktoberfest level: 2 Some knowledge of the existing code is recommended
Projects
None yet
Development

No branches or pull requests

3 participants