Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite in StorageServer deadlocked after the node was disconnected and resumed. #11578

Open
DuanChangfeng0708 opened this issue Aug 15, 2024 · 4 comments

Comments

@DuanChangfeng0708
Copy link

DuanChangfeng0708 commented Aug 15, 2024

My 3-node 3-duplicates fdb cluster.
version:7.1.27
The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows:
pstack

The backtrace result in the printed Net2RunLoopTrace is as follows:
20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely.
the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717
screenshot-20240815-173116

the rc is SQLITE_BUSY
the lockIdx is 4 and the n is 4

@DuanChangfeng0708
Copy link
Author

My 3-node 3-duplicates fdb cluster. version:7.1.27 The operation steps are: Node A is disconnected from the network for 10 minutes and restored. During the data moving, Node B is disconnected from the network for ten minutes and then restored. The storageServer of Node B has an infinite loop. The result of pstack is as follows: pstack

The backtrace result in the printed Net2RunLoopTrace is as follows: 20240815-172850

I gdb went in and found that I couldn't get a lock from SQLite。I gdb went in and found that I couldn't get a lock from SQLite. Then it tried again indefinitely. the code is https://github.com/apple/foundationdb/blob/main/contrib/sqlite/sqlite3.amalgamation.c#L37717 screenshot-20240815-173116

the rc is SQLITE_BUSY the lockIdx is 4 and the n is 4

my cpu: HUAWEI Kunpeng 920 5220
my OS: openEuler 22.03

@giorgiozoppi
Copy link

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

@DuanChangfeng0708
Copy link
Author

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

SQLite is famous for its concurrency problems, is that a problem if we remove a support on this?

Sorry, I didn't understand what you were trying to express. Are you trying to express that this issue was introduced by SQLite?

@giorgiozoppi
Copy link

Yes, we tried at work to use it for a PersistentQueue and we had a lot of headache and move to rocksdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants