Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Connection failure if handshake conclusion response is lost #1648

Open
maxsharabayko opened this issue Nov 9, 2020 · 1 comment · May be fixed by #1690
Open

[BUG] Connection failure if handshake conclusion response is lost #1648

maxsharabayko opened this issue Nov 9, 2020 · 1 comment · May be fixed by #1690
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@maxsharabayko
Copy link
Collaborator

maxsharabayko commented Nov 9, 2020

Short Description

In a caller-listener handshake workflow if the first HS Conclusion response (from listener to caller) was lost on the way, then the connection will fail to be established.

Affected Versions: SRT v1.4.2 and likely prior versions.

Problem Overview

Scenario

  1. Caller sends HS Induction request.
  2. Listener receives HS Induction REQ and replies with HS Induction response.
  3. Caller receives HS Induction RSP and sends HS Conclusion Request. A new connection is accepted.
  4. Caller does not receive the HS Conclusion RSP (the packet was lost), it repeats the HS Conclusion Request.

Case 1. A listener socket is not closed after a connection was accepted

Application gets a handler to the accepted socket (SRT Socket ID) but does not close the listener socket to process further conenction requests or the socket is about to be closed, but is not closed yet.

  1. Listener receives a repeated conclusion request, but the connection has already been established. It replies with an abnormal HS conclusion response (see CUDT::processConnectRequest), that does not have extensions.

  2. Caller receives abnormal HS Conclusion RSP without extensions, treats it as an error and closes the connection sending the SHUTDOWN packet.

Result: Conneciton is not established.

Listener-abnormal-hs

Case 2. A listener socket was closed after a connection was accepted

If a connection was already accepted on the listenting side, potentially the listener socket can be closed if no further connectios are excepted.

3.1. Application closes the listener socket. Application has the accpeted socket handle.

  1. See the scenario

  2. Listening peer receives a repeated conclusion request, but the connection has already been established and the listener socket has been closed. The conclusion request is ignored.

Result: Caller does not know if the connection is established or not. It can't start sending. But if listener is sender, it can receive data (TO CHECK!)

Listening side will start sending KEEPALIVE packets every 1 s.

Listener-closed

To Reproduce

Use two separate srt-live-transmit builds with some modifications for listener and caller.
Use Wireshark to capture packets being exchanged.
Run listener, then caller as described further.

Note that in the examples below both sides expect to receive packets, none will send any data. It does not matter, because the issue is around the connection establishment.

Listener

To reproduce Case 1 comment the line in SrtCommon::AcceptNewClient() that closes listener socket in srt-live-transmit.

//srt_close(m_bindsock);

Start listener:

./srt-live-transmit srt://:4200 udp://127.0.0.1:4201 -logfa:que-recv,conn -loglevel:debug -v

Caller

In CUDT::processAsyncConnectRequest add two more sending operations with some delay. It will make SRT caller to repeat HS Conclusion Request two times.

HLOGC(cnlog.Debug, log << "processAsyncConnectRequest: setting REQ-TIME HIGH, SENDING HS:" << m_ConnReq.show());
m_tsLastReqTime = steady_clock::now();
m_pSndQueue->sendto(serv_addr, request);
srt::sync::this_thread::sleep_for(srt::sync::microseconds_from(500));
m_pSndQueue->sendto(serv_addr, request);
srt::sync::this_thread::sleep_for(srt::sync::microseconds_from(500));
m_pSndQueue->sendto(serv_addr, request);

Start caller:

./srt-live-transmit srt://127.0.0.1:4200 udp://127.0.0.1:4201 -v
@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels Nov 9, 2020
@mbakholdina
Copy link
Collaborator

mbakholdina commented Nov 13, 2020

On the test application side, this could be reproduced by:

  • using srt-xtransmit
  • emulating the following network settings: packet loss 10% on both ends, burst packet loss - up 2 packets per drop event (or up to 5 packets per drop event for faster reproduction) on both ends

Here is the screenshot from LanForge

lanforge-burstloss

Commands to reproduce:

  1. Manually

    Receiver
    tshark -i em2 -f "udp port 4200" -s 1500 -w ./projects/maria/tmp/1648_rcv_2.pcapng
    bin/srt-xtransmit receive srt://:4200 -v

    Sender
    tshark -i enp3s0f1 -f "udp port 4200" -s 1500 -w ./projects/maria/tmp/1648_snd_2.pcapng
    bin/srt-xtransmit generate --sendrate 10Mbps --duration 60 srt://192.168.2.2:4200 -v

  2. Using lib-srt-utils library which runs a set of experiments and it's easier to catch the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants