You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was testing locks using my testcase.
I believe that there is a bug in the lock_info handling of locks_server and locks_agent, which may cause deadlock.
My testcase has 3 concurrent clients/agents, namely C1, C2, and C3, and 3 locks, [1], [2], and [3].
C1 requests locks in the order of [[1], [2], [3]]
C2 requests locks in the order of [[2], [3], [1]]
C3 requests locks in the order of [[3], [1], [2]]
Here is how the bug happened (in sketch):
C1, C2, and C3 competed on locks.
Due to the deadlock resolving algorithm, C1, C2 eventually acquired all locks and finished.
In the resolution process, C3 got lock_info of [2] (due to locks_agent:send_indirects/1)
even C3 hadn't reach the point of requesting it, which means C3 was not in [2]'s queue.
The locks_server remove the local lock_info entry of [2] since the queue is empty now.
This effectively resets the vsn of the lock_info.
C3 started requesting [2], but the locks_server would respond with lock_info that
had lower vsn than what C3 was told with. Thus C3 got stuck.
I've tried to fix by not removing lock_info entries in locks_server, but my fix seems to fail the test in other ways. Maybe this breaks the algorithm?
The text was updated successfully, but these errors were encountered:
xinhaoyuan
changed the title
Deadlocks due to outdated lock_info
Deadlock due to outdated lock_info
Aug 28, 2018
I was testing locks using my testcase.
I believe that there is a bug in the lock_info handling of
locks_server
andlocks_agent
, which may cause deadlock.My testcase has 3 concurrent clients/agents, namely C1, C2, and C3, and 3 locks, [1], [2], and [3].
Here is how the bug happened (in sketch):
C1, C2, and C3 competed on locks.
Due to the deadlock resolving algorithm, C1, C2 eventually acquired all locks and finished.
In the resolution process, C3 got lock_info of [2] (due to
locks_agent:send_indirects/1
)even C3 hadn't reach the point of requesting it, which means C3 was not in [2]'s queue.
The locks_server remove the local lock_info entry of [2] since the queue is empty now.
This effectively resets the vsn of the lock_info.
C3 started requesting [2], but the locks_server would respond with lock_info that
had lower vsn than what C3 was told with. Thus C3 got stuck.
I've tried to fix by not removing lock_info entries in
locks_server
, but my fix seems to fail the test in other ways. Maybe this breaks the algorithm?The text was updated successfully, but these errors were encountered: