-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
global write-lock held for two minutes by RPC next_receiving_address. #223
Comments
The persist_wallet() call is seeming more likely to be the culprit. I jut ran: The output includes this warning: The code at indicated line is: So here we have an isolated instance where flushing the databases took 2 seconds during the rpc server |
related:
note that tarpc cancels tasks on the server when a client-provided deadline is exceeded. By default that deadline is 10 seconds. observations:
|
I've begun experimenting with logging duration that the GlobalState write-lock is held for and I encountered a behavior I can't yet explain.
Anything that holds the write-lock effectively blocks processing for all other tasks until it completes, including the RPC server. So shorter is always better.
Some of the rpc timeout behavior observed and documented in issue #219 made me think that the write-lock is probably being held too long. To find out where, I modifed the AtomicRw to log a duration and the caller location when the write-lock-guard gets dropped, but only if the duration exceeds 100 milliseconds. So we only log slow processes.
I then followed the steps in issue #219 and observed an rpc timeout after visiting the Receive screen. Some time later, I found in the log this warning:
The referenced code that acquires the write-lock is:
It is very strange to me that this code is holding the write-lock for 2 minutes.
(note: after an hour running a node and mining 2 blocks and attempting to send a tx there are less than 10 warnings in the log, and the rest are under 1 sec, so 2 minutes is a huge anomaly).
Observations:
For now my plan is to complete development of the lock logging feature and just monitor logs for any slow-write-lock warnings. Maybe this is not repeatable.
also: I noticed that this particular RPC call logged warnings of up to 0.3 secs on the other 2 nodes. Maybe deriving the generation address is somehow more expensive than I would think.
The text was updated successfully, but these errors were encountered: