Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable DoT connection reuse while requests are in flight #269

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

kosekmi
Copy link

@kosekmi kosekmi commented Sep 17, 2022

Currently, a DoT connection is put back into the pool when the DNS response is received (or an error occurs). Hence, a new DoT connection is established if a second request should be served while a first request is in flight. While this behaviour effectively counters TCP HoLB, it is more beneficial to reuse the already established connection for the second request, since the overhead of the DoT connection establishment is pretty high (1 RTT for TCP, 0-2 RTTs for TLS (depending on TLS version, session resumption, and 0-RTT)).

This PR puts the connection right back into the pool to allow it to be reused while requests are in flight. The behaviour is now identical to DNS over HTTPS/2 (DoH2), where a single DoH2 connection is also used when multiple requests are in flight.

Attached you will find 2 pcaps showcasing the behaviour before and after the change: dot.inflight.reuse.pcaps.zip

@ameshkov
Copy link
Member

This change won't work properly when you process multiple queries through a single upstream.
The responses will be received out of order so the risk of having a DNS ID mismatch is very high.

The idea is correct, though. We should allow processing multiple queries in parallel through a single TCP/TLS connection.
But the implementation should be much more complicated and I doubt we can/should simply reuse the current TLSPool.

@kosekmi
Copy link
Author

kosekmi commented Sep 20, 2022

This change won't work properly when you process multiple queries through a single upstream.

Good point. We thought that this was already handled since it's required by the DoT RFC (and the implementation requirements RFC):

Since pipelined responses can arrive out of order, clients MUST match
responses to outstanding queries on the same TLS connection using the
Message ID. If the response contains a Question Section, the client
MUST match the QNAME, QCLASS, and QTYPE fields. Failure by clients
to properly match responses to outstanding queries can have serious
consequences for interoperability

We did some additional testing, and can confirm that out of order responses result in an id mismatch, hence closing the connection. We will develop a proper fix and update this PR.

42SK and others added 3 commits September 30, 2022 14:15
When processing multiple queries through a single DoT upstream, we might get
a DNS ID mismatch if responses are received out of order (cf. PR AdguardTeam#269).

We fix this by storing all responses with unknown IDs in a map from
which they can be retrieved later on.
… query

In order to deal with response reordering, RFC 7766 [1] requires that we
match the QNAME, QCLASS, and QTYPE fields in the response to the query.

[1] https://www.rfc-editor.org/rfc/rfc7766#section-7
DoT inflight connection reuse with handling for out-of-order responses as specified by RFC 7766
@kosekmi
Copy link
Author

kosekmi commented Sep 30, 2022

We just updated this PR with a fix for out-of-order responses, following the requirements the DoT RFC (and the implementation requirements RFC):

Since pipelined responses can arrive out of order, clients MUST match
responses to outstanding queries on the same TLS connection using the
Message ID. If the response contains a Question Section, the client
MUST match the QNAME, QCLASS, and QTYPE fields. Failure by clients
to properly match responses to outstanding queries can have serious
consequences for interoperability

We now store all responses with unknown IDs in a map from which they can be retrieved later on, and we explicitly check the QNAME, QCLASS, and QTYPE fields in the responses.

@@ -53,14 +53,16 @@ func (p *dnsOverTLS) Exchange(m *dns.Msg) (reply *dns.Msg, err error) {
}

p.RLock()
poolConn, err := p.pool.Get()
poolConnAndStore, err := p.pool.Get()
// Put the connection right back in to allow the connection to be reused while requests are in flight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally the same as using a single connection and not using pool at all. Why keeping it then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting rid of the connections pool and using a single connection is possible, but it does have visible downsides. Namely, head-of-line blocking will be a huge problem when the connection is not stable.

However, if we operate locally (for instance, dnsproxy proxies a recursor on the same server) using pipelining is in theory better.

Here's what I suggest:

  1. Try implementing a different type of upstream that uses a single connection (in a separate file).
  2. Write tests for it and check that pipelining actually works and that it keeps that single connection alive.
  3. IMPORTANT: we need to be ready that the server can close the connection any time. In this case we should first try re-establishing the connection and repeating the request.

Then I'll examine the implementation and if it's okay we can merge it with existing DoT and DNS-over-TCP upstreams and allow enabling pipelining mode via upstream.Options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants