Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on Flash processChanges #865

Open
keithchew opened this issue Sep 16, 2024 · 2 comments
Open

Crash on Flash processChanges #865

keithchew opened this issue Sep 16, 2024 · 2 comments

Comments

@keithchew
Copy link

keithchew commented Sep 16, 2024

Testing on async_flash branch (but looks like this bug has been around before then):

!!! ERROR: Deadlock detected !!!
	73: (0x7f0ddb83111c) StorageCache
	52: (0x7f0ddb8877f0) RocksDBStorageProvider
printing backtrace for thread 73

!!! KeyDB Will Now Crash !!!

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
7:73:M 15 Sep 2024 08:29:06.153 # ------------------------------------------------
7:73:M 15 Sep 2024 08:29:06.153 # !!! Software Failure. Press left mouse button to continue
7:73:M 15 Sep 2024 08:29:06.153 # Guru Meditation: Deadlock detected #fastlock.cpp:268

------ STACK TRACE ------

Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(fastlock_sleep+0x49f) [0x5594a94143df]
/opt/KeyDB/bin/keydb-server *:6379(+0x3746f9) [0x5594a943c6f9]
/opt/KeyDB/bin/keydb-server *:6379(StorageCache::insert(char*, void const*, unsigned long, bool)+0x44) [0x5594a942b8a4]
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::serializeAndStoreChange(StorageCache*, redisDbPersistentData*, char const*, bool)+0x73) [0x5594a9333e63]
/opt/KeyDB/bin/keydb-server *:6379(redisDbPersistentData::processChanges(bool)+0x279) [0x5594a9334179]
/opt/KeyDB/bin/keydb-server *:6379(beforeSleep(aeEventLoop*)+0x1018) [0x5594a92f9d08]
/opt/KeyDB/bin/keydb-server *:6379(aeProcessEvents+0x400) [0x5594a92ea630]
/opt/KeyDB/bin/keydb-server *:6379(aeMain+0x3e) [0x5594a92eae1e]
/opt/KeyDB/bin/keydb-server *:6379(workerThreadMain(void*)+0x12b) [0x5594a930438b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f0ddcc50609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f0ddcb73353]

Just looking at beforeSleep() in server.cpp, I can see locker.disarm() and locker.arm() before calling db methods, eg:

        locker.disarm();
        for (redisDb *db : vecdb)
            db->commitChanges();
        locker.arm();

Taking a stab in the dark, do we also need to add them for the processChanges() call? ie:

            locker.disarm(); <---------------- THIS
            for (int idb = 0; idb < cserver.dbnum; ++idb) {
                if (g_pserver->db[idb]->processChanges(false))
                    vecdb.push_back(g_pserver->db[idb]);
            }
            locker.arm(); <---------------- AND THIS

Will keep investigating, and report back here if I find anything else helpful.

@keithchew
Copy link
Author

I think I have found it!

In StorageCache.cpp count(), we have:

    std::unique_lock<fastlock> ul(m_lock, std::defer_lock);
    bool fLocked = ul.try_lock(); <-- LOCK A
    size_t count = m_spstorage->count(); <-- LOCK B

And in db.cpp processChanges(), we have:

m_spstorage->beginWriteBatch(); <-- LOCK B
...
serializeAndStoreChange(); <-- LOCK A

If 2 different threads call the 2 methods above, they can cause a deadlock. In addition, it looks like the processChanges() is protected by the global lock, but the count() is not.

Will keep looking...

@keithchew
Copy link
Author

On my local, I have made this one-liner change in StorageCache.cpp:

size_t StorageCache::count() const
{
    size_t count = m_spstorage->count(); <--- MOVED from bottom
    std::unique_lock<fastlock> ul(m_lock, std::defer_lock);
    bool fLocked = ul.try_lock();
    // size_t count = m_spstorage->count(); <--- MOVED up

The above will ensure there is no overlapping locks between StorageCache and RocksDBStorageProvider, but I am unsure about the side effects. If it is still causing issues, will have to see if I can incorporate the global lock for the count().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant