Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dtls handshake is not thread safe #1848

Open
2 tasks done
wei-zhang-simplisafe opened this issue Nov 20, 2023 · 2 comments
Open
2 tasks done

[Bug]: Dtls handshake is not thread safe #1848

wei-zhang-simplisafe opened this issue Nov 20, 2023 · 2 comments
Labels
1.7.2 bug Something isn't working needs-triage

Comments

@wei-zhang-simplisafe
Copy link

wei-zhang-simplisafe commented Nov 20, 2023

Please confirm you have already done the following

  • I have searched the repository for related/existing bug reports
  • I have all the details the issue requires

Describe the bug

KVS SDK supports multiple connections in parallel. It means that connection requests (offers) can come at any time and so two connection requests can come at almost the same time.
In this case, both connections will do DTLS handshake at the same time. Test results two parallel DTLS handshakes fail frequently.
The followings are a few examples of DTLS failures:

  1. first connection succeeds, but second connection fails to finish peerConnection creation. Error message:
    2023-11-18 17:04:59 ERROR createDtlsSession(): operation returned status code: 0x59000001
    2023-11-18 17:04:59 ERROR createPeerConnection(): operation returned status code: 0x59000001
    2023-11-18 17:04:59 ERROR iceAgentShutdown(): operation returned status code: 0x00000001
    2023-11-18 17:04:59 ERROR freePeerConnection(): operation returned status code: 0x00000001
    2023-11-18 17:04:59 ERROR freePeerConnection(): operation returned status code: 0x00000001
    2023-11-18 17:04:59 ERROR freePeerConnection(): operation returned status code: 0x00000001

2a. first connection fails, and second connection successfully finishes Dtls handshake. First connection retries Dtls handshake many times and always ends up with timeout. Error message:
2023-11-16 16:25:06 INFO onInboundPacket(): SS_Debug_Patch Dtls incoming packet
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:24067044:random number generator:rand_pool_add:internal error
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:02015016:system library:ioctl:Invalid argument
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:2406B070:random number generator:RAND_DRBG_generate:generate error
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:100A7003:elliptic curve routines:ec_GFp_simple_point_get_affine_coordinates:BN lib
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:100FA010:elliptic curve routines:ossl_ecdsa_verify_sig:EC lib
2023-11-16 16:25:06 WARN dtlsSessionProcessPacket(): SSL_read failed with error:1416D07B:SSL routines:tls_process_key_exchange:bad signature

2b. first connection fails, and second connection successfully finishes Dtls handshake. First connection retries Dtls handshake many times and always ends up with timeout. Error message:
2023-11-19 23:24:24 INFO onInboundPacket(): SS_Debug_Patch Dtls incoming packet
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:24067044:random number generator:rand_pool_add:internal error
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:02015016:system library:ioctl:Invalid argument
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:2406B070:random number generator:RAND_DRBG_generate:generate error
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:1011C088:elliptic curve routines:ec_scalar_mul_ladder:ladder post failure
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:100F8010:elliptic curve routines:ECDSA_sign_setup:EC lib
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:100F902A:elliptic curve routines:ossl_ecdsa_sign_sig:ECDSA lib
2023-11-19 23:24:24 WARN dtlsSessionProcessPacket(): SSL_read failed with error:141F0006:SSL routines:tls_construct_cert_verify:EVP lib

2c. first connection fails, and second connection successfully finishes Dtls handshake. First connection retries Dtls handshake many times and always ends up with timeout. Error message:
2023-11-20 00:03:35 WARN dtlsSessionProcessPacket(): SSL_read failed with error:24067044:random number generator:rand_pool_add:internal error
2023-11-20 00:03:35 WARN dtlsSessionProcessPacket(): SSL_read failed with error:02015016:system library:ioctl:Invalid argument
2023-11-20 00:03:35 WARN dtlsSessionProcessPacket(): SSL_read failed with error:2406B070:random number generator:RAND_DRBG_generate:generate error
2023-11-20 00:03:35 WARN dtlsSessionProcessPacket(): SSL_read failed with error:14195041:SSL routines:tls_construct_cke_ecdhe:malloc failure

  1. first connection fails, and second connection successfully finishes Dtls handshake. First connection never retries Dtls handshake there is no timeout. Error message:
    2023-11-19 23:35:48 WARN dtlsSessionStart(): SSL_do_handshake failed with error:24067044:random number generator:rand_pool_add:internal error
    2023-11-19 23:35:48 WARN dtlsSessionStart(): SSL_do_handshake failed with error:02015016:system library:ioctl:Invalid argument
    2023-11-19 23:35:48 WARN dtlsSessionStart(): SSL_do_handshake failed with error:2406B070:random number generator:RAND_DRBG_generate:generate error
    2023-11-19 23:35:48 WARN dtlsSessionStart(): SSL_do_handshake failed with error:141E7044:SSL routines:tls_construct_client_hello:internal error

Expected Behavior

Two simultaneous Dtls handshakes should be safe and both succeed.

Current Behavior

More than 90% chance one of the two simutaneous Dtls handshakes will fail

Reproduction Steps

Sending two offers to KVS master within a short time window, ideally within 100ms

WebRTC C SDK version being used

1.7.2

Compiler and Version used

g++

Operating System and version

Linux 1.18

Platform being used

Linux

@wei-zhang-simplisafe wei-zhang-simplisafe added bug Something isn't working needs-triage labels Nov 20, 2023
@wei-zhang-simplisafe
Copy link
Author

As an experiment, I added a global mutex. When any of the following functions is called, the mutex is locked and unlocked before returns. Then the errors in this ticket never show again
STATUS createDtlsSession(PDtlsSessionCallbacks, TIMER_QUEUE_HANDLE, INT32, BOOL, PRtcCertificate, PDtlsSession*);
STATUS freeDtlsSession(PDtlsSession*);
STATUS dtlsSessionStart(PDtlsSession, BOOL);
STATUS dtlsSessionProcessPacket(PDtlsSession, PBYTE, PINT32);
STATUS dtlsSessionIsInitFinished(PDtlsSession, PBOOL);
STATUS dtlsSessionPopulateKeyingMaterial(PDtlsSession, PDtlsKeyingMaterial);
STATUS dtlsSessionGetLocalCertificateFingerprint(PDtlsSession, PCHAR, UINT32);
STATUS dtlsSessionVerifyRemoteCertificateFingerprint(PDtlsSession, PCHAR);
STATUS dtlsSessionPutApplicationData(PDtlsSession, PBYTE, INT32);
STATUS dtlsSessionShutdown(PDtlsSession);
STATUS dtlsSessionOnOutBoundData(PDtlsSession, UINT64, DtlsSessionOutboundPacketFunc);
STATUS dtlsSessionOnStateChange(PDtlsSession, UINT64, DtlsSessionOnStateChange);

Copy link

This is a very old issue. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.7.2 bug Something isn't working needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants