Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS over TLS is extremely slow (~40+ms) because of slow TLS handshake #1202

Closed
frebib opened this issue Dec 8, 2024 · 4 comments
Closed
Assignees

Comments

@frebib
Copy link

frebib commented Dec 8, 2024

Describe the bug
Simply dig @my.dns.server#853 +tls some.query consistently takes 40-45ms
Configuring nginx to perform the TLS decryption reduces this to 20-30ms initially, then to zero for subsequent requests (reusing tls session?)

To reproduce
Steps to reproduce the behavior:

  1. Configure unbound for TLS with a few simple options:
interface: 2a02:8010:64b4::d45a@853
tls-port: 853
tls-service-key: path/to/private.key
tls-service-pem: path/to/fullchain.cer
tls-ciphers: "ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256"
tls-ciphersuites: "TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256"

Note that setting tls-ciphers here seems to make no difference
I'm using ecdsa certificates from letsencrypt here (ecdsa-with-SHA384 / secp384r1 / P-384)

  1. Perform the dig
  2. Observe the TLS handshake taking forever

Expected behavior
Unbound should be able to be (nearly) as fast as nginx

System:

  • Unbound version: 1.20.0
  • OS: Vyos 1.5.x (although unbound is running in a alpinelinux/unbound container)
  • unbound -V output:
/etc/unbound # unbound -V
Version 1.20.0

Configure line: --build=x86_64-alpine-linux-musl --host=x86_64-alpine-linux-musl --prefix=/usr --sysconfdir=/etc --mandir=/usr/share/man --localstatedir=/var --with-username=unbound --with-run-dir= --with-pidfile= --with-rootkey-file=/usr/share/dnssec-root/trusted-key.key --with-libevent --with-pthreads --disable-static --disable-rpath --enable-dnstap --with-ssl --without-pythonmodule --with-pyunbound
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 3.3.2 3 Sep 2024
Linked modules: dns64 respip validator iterator

BSD licensed, see LICENSE in source package for details.
Report bugs to [email protected] or https://github.com/NLnetLabs/unbound/issues

Additional information
Here are two pcap files showing difference between nginx and unbound. In both cases the queries were performed using dig. There's also a dig +tcp in each one for contrast. The captures were taken from the same host on which the unbound process isi running.

unbound-pcaps.zip

This is the nginx configuration I'm using

worker_processes 1;

events {
    worker_connections 4096;
    multi_accept on;
    use epoll;
}

stream {
    upstream dns {
        zone dns 64k;
        server [2a02:8010:64b4::d45a]:53;
    }

    server {
        # listen 853 ssl;
        listen [2a02:8010:64b4::d45a]:853 ssl;

        ssl_certificate_key path/to/private.key;
        ssl_certificate path/to/fullchain.cer;

        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ecdh_curve secp384r1:secp521r1:prime256v1:X25519;
        ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
        ssl_prefer_server_ciphers on;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 10m;
        ssl_session_tickets off;

        proxy_pass dns;
    }
}
@gthess gthess self-assigned this Dec 9, 2024
@gthess
Copy link
Member

gthess commented Dec 9, 2024

What I see in the Unbound pcap is that packets 12 and 29 take consistently ~40ms but these packets are from dig to Unbound. This looks like it could be a timeout thing involved. I'll see if it is consistent to other ssl versions.
I expect that OpenSSL 3.3.2 is used for all software involved (Unbound, dig, ngingx) ?

@frebib
Copy link
Author

frebib commented Dec 9, 2024

but these packets are from dig to Unbound

I did notice that and I found it odd. I wondered if it was something that unbound had sent that made the client do some heavy computation or something. I don't know enough about how TLS works specifically to say though.

The nginx container is using

root@zeus:/# openssl version
OpenSSL 3.0.15 3 Sep 2024 (Library: OpenSSL 3.0.15 3 Sep 2024)

I was performing dig from my computer which has 3.4.0 https://archlinux.org/packages/core/x86_64/openssl/

@gthess
Copy link
Member

gthess commented Dec 10, 2024

This seems to also be related to #1045. I reran the environments for that and I can confirm that using kdig (because the dig package on ubuntu20_04 has no support for tls) produces the same results on ubuntu24_04. The main difference I see now is OpenSSL 1.1.1 to OpenSSL3*; I am using the latest Unbound version for both environments.
I'll look further if I can spot something.
Previously this was not obvious to us because those ~40ms delays when sending data are not manifested with our own client test program for DoT, streamtcp.

@gthess
Copy link
Member

gthess commented Jan 10, 2025

This was identified to be the TCP_NODELAY socket option. More information can be found on the #1214 PR.

@gthess gthess closed this as completed Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants