Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data Race CUDTSocket::removeFromGroup vs CUDT::processData #2975

Closed
maxsharabayko opened this issue Jul 9, 2024 · 3 comments
Closed
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@maxsharabayko
Copy link
Collaborator

WARNING: ThreadSanitizer: data race (pid=5444)
  Write of size 8 at 0x7ba80005a068 by thread T1 (mutexes: write M226, write M223):
    #0 srt::CUDTSocket::removeFromGroup(bool) /srt/srtcore/api.cpp:3484 (srt-xtransmit+0x297177)
    #1 srt::CUDTUnited::garbageCollect(void*) /srt/srtcore/api.cpp:3336 (srt-xtransmit+0x28f3eb)

  Previous read of size 8 at 0x7ba80005a068 by thread T4:
    #0 srt::CUDT::processData(srt::CUnit*) /srt/srtcore/core.cpp:10422 (srt-xtransmit+0x378165)
    #1 srt::CRcvQueue::worker_ProcessAddressedPacket(int, srt::CUnit*, srt::sockaddr_any const&) /srt/srtcore/queue.cpp:1475 (srt-xtransmit+0x3f01f6)
    #2 srt::CRcvQueue::worker(void*) /srt/srtcore/queue.cpp:1254 (srt-xtransmit+0x3ecfb7)

SRT v1.5.3+ (54c002f).

To Reproduce

Build srt-xtransmit with Thread Sanitizer (-DCMAKE_CXX_FLAGS='-fsanitize=thread' -DWITH_COMPILER_TYPE=clang).
The error is reported on the receiver.

srt-xtransmit generate srt://172.27.16.247:4200 srt://172.27.16.247:5200 --enable-metrics --loglevel note --sendrate 5Mbps --num 2000 --reconnect --concurrent-streams 2 --maxconns 6 

srt-xtransmit receive srt://:4200 srt://:5200 --loglevel note --enable-metrics --reconnect --maxconns 6 --concurrent-streams 2

Epic #1610 – Core Synchronization Issues.

@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Jul 9, 2024
@maxsharabayko maxsharabayko added this to the v1.5.4 milestone Jul 9, 2024
@ethouris
Copy link
Collaborator

ethouris commented Jul 9, 2024

removeFromGroup is getting executed under a lock of m_GlobControlLock. Theoretically packData checks the m_bClosing flag in the beginning, but no lock is applied there. Actually removal from the group is theoretically allowed on a socket that is processing an incoming packet, but then we need it either this field atomic, or we need to find a way to just skip the GC cycle by a socket that is under processing - and that was part of what this SocketKeeper was for. This skipping will not happen forever. You can have just one case of calling a processData function while m_bClosing field is not yet set. If the simultaneous call for removeFromGroup has happened, it means that this field is set at this time, so processData will not execute the next time.

@ethouris
Copy link
Collaborator

This problem should be now addressed in #2893. The busy counter is added in the function that dispatches the packet to the given socket. This socket will not be deleted as long as processing isn't finished, but if the socket will be moved to trash in the meantime, any next dispatching will not be possible, so no more busy-acquisition will be done.

@maxsharabayko
Copy link
Collaborator Author

Can't reproduce any more even with the stated SRT commit ID.

72303d7 (Jun 12) - not reproducible

54c002f (Jul 9) - not reproducible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants