Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(netxlite): add HTTP/HTTPS proxy dialer #1162

Open
wants to merge 353 commits into
base: master
Choose a base branch
from

Conversation

Murphy-OrangeMud
Copy link

@Murphy-OrangeMud Murphy-OrangeMud commented Jun 24, 2023

Checklist

Description

Add support for HTTP/HTTPS proxy dialing in netxlite.

Even though we can already pass --proxy with HTTP/HTTPS proxies to ooniprobe, e.g.,.

./ooniprobe run all --proxy http://127.0.0.1:7890

The implementation of this functionality added as part of ooni/probe#2531 (comment) does not allow to have per-domain specific proxies or to perform proxy hopping, as explained in ooni/probe#2552. By adding an HTTP/HTTPS proxy dialer to netxlite, instead, we would make these kind of advanced behavior possible.

@Murphy-OrangeMud Murphy-OrangeMud marked this pull request as draft June 28, 2023 03:26
@Murphy-OrangeMud Murphy-OrangeMud marked this pull request as ready for review June 29, 2023 04:53
@bassosimone
Copy link
Contributor

bassosimone commented Jul 4, 2023

Thanks for contributing! I'll take a look at merging this diff when preparing v3.19.x and in the meanwhile I will add this diff to my queue! Thanks again!

bassosimone added a commit to ooni/probe-android that referenced this pull request Jul 12, 2023
Unfortunately, because of ooni/probe#2406, we
are going to see crashes when using these proxies.

This diff is part of ooni/probe#2500. Since we're
increasingly being blocked, it makes sense to exposes all the possible
proxies we can feature.

We're going to touch upon the same files again once we land the
ooni/probe-cli#1162 pull request.
Copy link
Contributor

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented HTTP/HTTPS proxies by brutally reusing the standard library functionality as documented at ooni/probe#2531 (comment), because I was working on related code and that seemed very easy to do.

The solution I implemented has some drawbacks that suggest that it would still be useful to have a way of connecting to a proxy along the lines you sketched out here. In particular, I believe this functionality would allow us to specify proxies in a more fine grained way and it would also allow us to dynamically switch proxy. For this reason, I am still very interested in seeing this code landing in the master branch sometime in the future.

However, I think this pull request is probably trying to do too many things for the objectives that we have in OONI and I have provided some initial recommendations to simplify and streamline the code. Please, start by applying the changes I recommended, such that we can move a step forward and can gradually make this branch ready for landing.

Thank you for your work and effort! 🙏 🙂

for reflect.TypeOf(conn).ConvertibleTo(reflect.TypeOf(&dialerErrWrapperConn{})) {
conn = conn.(*dialerErrWrapperConn).Conn
}
conn = &dialerErrWrapperConn{Conn: conn}
Copy link
Contributor

@bassosimone bassosimone Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is very fragile and assumes that we're creating a specific chain of wrappers. I'd rather try to rewrite the pull request such that we avoid using this code entirely. My understanding is that you don't want to wrap the connection more than once. Wrapping multiple times should be fine, but probably it's also possible to avoid wrapping by using a model.UnderlyingNetwork, which provides a dialer that, in the common case, is the standard library dialer.

return d.Dialer.(proxy.ContextDialer).DialContext(ctx, network, address)
}
return d.dial(ctx, child, network, address)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here:

  1. in terms of functionality, the PR claims support for "https" proxies, but here I only see support for "http"

  2. I would be happier if we could use a map with key the URL scheme and value the function to invoke depending on the URL scheme (I like this pattern more than using a chain of if)

  3. additionally, FYI, I like code to avoid using a else if and else when the previous if clause terminates with a return.

internal/netxlite/httpproxy_test.go Show resolved Hide resolved

func TestHTTPProxyDialer(t *testing.T) {
// REMINDER: This test need a http proxy running locally
dialer := NewHTTPDialer("tcp", "localhost:7890")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently added support for proxies in the ./internal/testingx package. I think you may want to take advantage of that for writing tests that always use a known working HTTP/HTTPS proxy.

internal/netxlite/httpproxy.go Show resolved Hide resolved
internal/netxlite/httpproxy.go Show resolved Hide resolved
internal/netxlite/httpproxy.go Show resolved Hide resolved
internal/netxlite/httpproxy.go Show resolved Hide resolved
internal/netxlite/httpproxy.go Show resolved Hide resolved
internal/netxlite/httpproxy.go Show resolved Hide resolved
@bassosimone bassosimone changed the title Add support for http/https proxy feat(netxlite): add HTTP/HTTPS proxy dialer Oct 9, 2023
It would be a *partial* deatch where we would override what we need for
implementing ooni/probe#2531.

To this end, I have created a new package called `enginenetx` (i.e., the
engine network extensions), which will contain engine-specific code.

For now, I am just forwarding calls to netxlite. More changes will come
as part of subsequent pull requests.
…ooni#1266)

This diff inlines the original implementation of the
netxlite.NewHTTPTransportWithLoggerResolverAndOptionalProxyURL function
inside the enginex package.

With this diff, we continue to partially detach the engine networking
from netxlite, to implement the beacons as described by
ooni/probe#2531.
This diff splits http.go and http_test.go into multiple files to create
the logical and mental space required to implement new construction
routines suitable for the engine package.

The history of this diff is simple. I started working on forking http.go
for the `./internal/enginenetx` package. Then, I realized it would have
been possible to have those changes directly inside netxlite, if only I
did refactor the http implementation to have a bit more clarity and
file-based modularity. And so I did.

Part of ooni/probe#2531
This diff isolates and annotates netxlite quirky functions such that
ooni/probe#2534 will be easier.

This work is also useful to ooni/probe#2531.

While there, commit workaround for issue
ooni/probe#2535.
Now that we've clearly labeled and packaged technical debt, we can copy
existing technical-debt-ridden code and modify it to obtain a flexible
factory for creating HTTP transports, `NewHTTPTransportWithOptions`.

In particular, this factory uses sensible defaults for measuring and
there are options that you can pass it to customize details such as the
backend proxy that we previously unconditionally configured.

More in detail, this is a side-by-side comparison between new code's
defaults and old code:

| Setting Name | httpquirks.go (old code) | httpfactory.go (new code) |
| ------------------- | ------------------------ |
-------------------------- |
| .Proxy | nil | nil |
| .MaxConnsPerHost | 1 | ooni/oohttp's default |
| .DisableCompression | true | true |
| .ForceAttemptHTTP2 | true | true |

So, basically, the biggest change is that we've removed the limitation
of the max number of connections per host being set to 1. In any case,
the new code allows to configure each of these fields.

This factory will be the starting point for having custom network
functions for the engine in line with
ooni/probe#2531.

While there, acknowledge that `NewHTTPClient` contains technical debt
because it does unconditionally disable cookies, to document that and
move it inside of the proper file (`httpquirks.go`).
This diff has been extracted from
ooni#1271 to reduce the overall diff
size.

Reference issue: ooni/probe#2531.
We want more comprehensive testing of how we use proxies during the
bootstrap. Tests should encompass both SOCKS5 and HTTP(S) proxies. Tests
should support using both the host network and using netem.

This diff starts paving the way for improving proxy testing, by
introducing in `./internal/testingx/httpproxy.go` an HTTP(S) proxy
implementation supporting both the case of HTTP over HTTP(S) proxy
(where you use the host header) and HTTPS over HTTP(S) proxy (where the
client issues a `CONNECT` request to the remote endpoint).

The previous implementation (in `./internal/testingx/httptestx.go`) is
still there, for now. We used the previous implementation, which only
supported the host header, as a starting point for writing the new one.

More in detail, this diff introduces the new proxy and its tests.
Because testing the proxy functionality is a bit complex, I chose to use
a separate package and also write tests for the tests. Obviously, we're
still using `netxlite` as the underlying library, so there is some
circularity in testing, but `netxlite` also has its own tests, so we
should probably be fine.

The separate package with tests is `./internal/testingproxy`.

While working on this package, I realized the need to forward the CA
used by the proxy. This happens in
`./internal/testingproxy/hosthttps.go`. If we do not do this, we see the
following failure:

```
=== RUN   TestWorkingAsIntended/fetching_https://www.example.com/_using_the_host_network_and_an_HTTPS_proxy
2023/09/15 15:28:39 debug > GET https://www.example.com/
2023/09/15 15:28:39 debug >                        

2023/09/15 15:28:39 dialerWithAssertions: verified that the network is tcp as expected
2023/09/15 15:28:39 dialerWithAssertions: verified that the address is 127.0.0.1 as expected

2023/09/15 15:28:39 debug dial 127.0.0.1:61772/tcp...
2023/09/15 15:28:39 debug dial_address 127.0.0.1:61772/tcp...
2023/09/15 15:28:39 debug dial_address 127.0.0.1:61772/tcp... ok in 212.083µs
2023/09/15 15:28:39 debug dial 127.0.0.1:61772/tcp... ok in 222.208µs

2023/09/15 15:28:39 debug tls {sni=127.0.0.1 next=[]}...
2023/09/15 15:28:39 debug tls {sni=127.0.0.1 next=[]}... ssl_unknown_authority in 3.687875ms

[...]
```

In other words, the TLS config we're using does not know the proxy CA.
So, in this diff you also see the required work to forward the proxy CA,
including the related netem changes in
ooni/netem#38.

The reference issue is ooni/probe#2531.
I'm glad I did this, because it allowed me to discover
ooni/probe#2536.

Apart from that, business as usual: adapt existing test cases for the
previous simpler HTTP proxy to use netem.

Reference issue: ooni/probe#2531

Overall objective: have better testing for the boostrap, which is
important to validate new beacons code.
This diff contains more tests for `testingx.NewHTTProxyHandler`.

Refefence issue: ooni/probe#2531.

Overall objective: much better testing coverage of bootstrap with
proxies, to implement beacons with confidence.
This diff replaces `testingx.HTTPHandlerProxy` with
`testingx.NewHTTPProxyHandler` as the proxy used for implementing netemx
scenarios and removes `testingx.HTTPHandlerProxy`.

We introduced `testingx.NewHTTPProxyHandler` in
ooni#1274. It is a more comprehensive
proxy because it supports both proxying via HTTP header and CONNECT
proxying.

While there, emit more clear debug messages during the TLS handshake.

While there, fix how we're skipping tests in `testingx` and
`testingproxy` because otherwise we end up skipping the netem tests that
we should not be skipping. (Noticed of this because the coverage
dropped!)

Reference issue: ooni/probe#2531

Overall objective: good testing for proxying for when we introduce
beacons.
The lack of this support already created some difficulties inside the
testingx package and I am increasinglty sick of it.

While there, see to increase the testing coverage of the netxlite
package.

While there acknowledge and commit workaround for
ooni/probe#2537.

Added while looking into moving forward with making beacons fully
testable using netem, per ooni/probe#2531.
This diff adds the ListenTCP function to the *Netx struct.

With this addition, based on
ooni#1278, we can remove some techdebt
inside the testingx package.

This is yak shaving while trying to add support for testing socks5 in
the context of ooni/probe#2531.
This diff imports a fork of github.com/armon/go-socks5 that has been
adapted to use netem and simplified to suit our testing needs.

With this functionality in tree, we can start thinking about writing
better netem based tests for the ooniprobe bootstrap.

The overall idea is to be well positioned to improve the bootstrap and
introduce dynamic support for beacons.

While there, discover that we were using `log.SetLevel(log.DebugLevel)`
in a racy way in tests, so remove all the instances of this call from
tests given that we can always add it when needed and we don't want to
keep commented out code as a general policy anyway.

Reference issue: ooni/probe#2531
I am making progress with ooni/probe#2531 and
I want to reactor model/netx.go such that the TLSHandshaker returns a
model.TLSConn rather than a net.Conn.

Returning a net.Conn and documenting it is a model.TLSConn is bad
compared to returning a model.TLSConn directly.

Note that we cannot apply the same transformation to netxlite's
TLSDialer.DialTLSContext because such a method must be assignable to
net/http and github.com/ooni/oohttp's Transport function also called
DialTLSContext.

The fact that we need code to be assignable to the Transport function is
what historically led the TLSHandshaker to return a net.Conn as well.
But it was quite clear from the get go that this choice led to some
quirks (and, in fact, this behavior was explicitly documented as such).

While there, slightly refactor `internal/experiment/echcheck/utls.go` to
avoid storing the conn inside the handshaker and make sure the test
coverage does not drop for this experiment.

While there, note that ooni/probe#2538 exists
and commit a mitigation.
This diff completes the work we have been doing for a few days now and
provides HTTP and HTTPS proxy support, in addition to SOCKS5 support,
for the engine-specific network.

We did this work in the context of
ooni/probe#2531 and
ooni/probe#1955.

BTW, the fact that tests used `measurexlite` and tracing is very nice
here. It means the idea to write `measurexlite` based on context and
tracing was good and could be used beyond its original design goals.
This commit introduces a configurable HTTPS dialer that we will use for
implementing beacons, and possibly also other functionality.

We need to perform TCP connect and TLS handshake as part of the same
goroutine, because we cannot consider a dialing attempt successful after
a successful TCP connect. Due to network interference, the dialing may
also fail later during the TLS handshake.

To support several possible HTTPS dialing strategies, we need to extend
what happens during a LookupHost operation. Generally, one would like to
have addresses to dial. Rather, here we have tactics, where a single IP
address MAY be included into more than a single tactic, if we need to
try different tricks with the same address.

Also, tactics include a delay, which is useful to (a) avoid performing
all operations in parallel, which is not gentle towards otherwise
perfectly functioning networks and (b) give penalty to tactics that
utilize circumvention, such that we don't even attempt them unless we
need to.

In turn, the DNS resolver is wrapped by a policy for configuring TLS
dialing. Basically, the policy observes the IP addresses returned by an
underlying resolver and then it will decide which tactics to produce
based on that. Note that the policy could also extend the set of
returned IP addresses when the domain for which we connect is such that
we have known IP addresses in advance for such a domain.

The default policy we introduce in this commit behaves as follows:

1. it asks the engine to create 16 goroutines for dialing;

2. it uses the DNS lookup results w/o adding any extra IP addr;

3. it produces a tactic for each IP address where we use the domain as
the SNI and we add a 300 millisecond delay to the second tactic, 600 to
the third, and so on--which is similar to implementing happy eyeballs.

It's also worth noting that, tactics MAY override the TLS handshaker
being used (for example, to use uTLS) and, also, because they may use
different SNIs, the TLS verification is performed AFTER the TLS
handshake. (Because I set `InsecureSkipVerify`, I expected--or rather
*demand*--GitHub to produce a security warning for this commit, which I
will mark as a false positive, because TLS verification is performed a
few lines of code after that.)

As far as the algorithm for verification is concerned, it comes from the
Go examples and it matches closely the code used by the standard library
to perform verification. More details about this in comments in the
commit code.

Part of ooni/probe#2531
This feature will be useful for extra tests I would like to write for
ooni/probe#2531.
…1285)

This uses the code introduced in the previous commit, i.e.,
ooni@ac13e53.

I think it's good to have additional confidence that the `HTTPSDialer`
does not leak connections. (Ideally, we would like to verify that we're
not leaking connections everywhere, but for now just doing it for new
code would do okay.)

While there, repair a test that was flaky because of nondeterministic
channel selection.

Part of ooni/probe#2531.
This diff modifies the tactics callbacks to take in input a context.

We need the context to know whether the operation failed in itself or
was canceled externally through the context.

With this information, we can store better statistics about which
tactics are working and which tactics are not working.

While there, fix a couple of typos (thanks @robertodauria!)

Part of ooni/probe#2531
bassosimone and others added 30 commits February 12, 2024 21:50
This diff moves the `StreamAllContext` implementation from
`webconnectivitylte` to `netxlite`.

We flagged this diff as BREAKING CHANGE because we also fixed a bug
where, in case of context timeout, `StreamAllContext` was suppressing
the error rather than reporting it. Re-reading the code, that felt
incorrect because it did not allow us to clearly know that we timed out
reading the body, which could be caused by throttling.

The new behavior seems more accurate.

Because of this change, I also felt it was time to add explicit tests
for cases where we download fails because it takes too much time for us
to fetch the whole response body. I am not 100% sure we are correctly
flagging this case, but an `http-failure` probably seems fine at the
moment and we can always revisit this as we learn more about throttling.

Part of ooni/probe#2654.
We used the run experiment for the DoT and DoH blocking paper, i.e.,
https://censorbib.nymity.ch/#Basso2021a.

Beyond that, the run experiment has never been advertised and its
functionality duplicates some OONI Run v2 functionality.

Therefore, it's safe to remove this experiment, which will make
ooni/probe#2667 easier to implement.
While approaching ooni/probe#2667, I
recognized that the approach we're using to generate `SummaryKeys` is
redundant, verbose, and fragile. This diff attempts to improve our
implementation.

We define new types for generating summary keys and
`engine.MeasurementSummaryKeys` to either return the proper summary
keys, if the experiment `TestKeys` support that or return a default
value.

While there, disable the flaky `throttlingWithHTTP` QA check (see
ooni/probe#2668).
Closes ooni/probe#2667.

While there, repair flaky unit test and explain why it was flaky.
)

There's no need to use the older NewHTTPTransport factory for creating a
new HTTP transport, because this codebase doesn't need to use any quirk
implemented by such a transport.

While there, move TODOs around the codebase.

Part of ooni/probe#2669.
…1496)

This diff refactors webconnectivitylte by moving some algorithms inside
the new webconnectivityalgo package.

In subsequent commits, we'll seize the opportunity of adding tests for
these algorithms, refactor the code, and add specific tests.

Part of ooni/probe#2669.

While there, recognize that the webconnectivityqa package does not
belong to internal/experiment but to internal.
Because the singleton is always active, we need to expire the cache
otherwise we don't catch changes in the client network.

Part of ooni/probe#2669

Closes ooni/probe#2671
Using one resolver at random from a pool of some has been requested by
users.

While there link all the remaining TODOs to existing open issues.

Closes ooni/probe#2669.

Here are three measurements showcasing this new feature:

1. [using
1.0.0.1:53](https://explorer.ooni.org/m/20240208153440.990674_IT_webconnectivity_646b76338342a1a8)
2. [using
1.1.1.1:53](https://explorer.ooni.org/m/20240208154552.863516_IT_webconnectivity_97c3ed1a6bbebd5e)
3. [using
9.9.9.9:53](https://explorer.ooni.org/m/20240208154616.549959_IT_webconnectivity_2515d794df2ebd34)
While there, notice that we can increase the coverage in
webconnectivityqa.

Closes ooni/probe#2674
We add dns.nextdns.io to the QA suite, such that we don't get a wrong
NXDOMAIN when trying to use it as the resolver.

We optionally sort the test keys to ensure there's less churn in diffs
produced when regenerating minipipeline test data.

We fix ./script/updateminipipeline.bash to use ./script/go.bash for
building as opposed to using go.

Also, include endpoint information in `tls_handshake_{start,stop}` and
`http_transaction_{start,stop}` events such that they are sorted
correctly when we sort events to reduce churn. This is necessary because
we're using the endpoint information to sort the network events.

Part of ooni/probe#2677; the churn is still
too high, and I need more changes.
I realized it was not the best idea to modify the measurement algorithm
for producing a better test suite. With a small refactor, I can make it
such we only perform these changes when we need. Then, I need to
regenerate all the files, which comes with some churn. At this point,
it's sad but also fine. I'll try to improve things a bit more from now
one.

Part of ooni/probe#2677
We use scope for endpoint IDs. This helps to reduce churn when running
`./script/updateminipipeline.bash`.

Part of ooni/probe#2677
…#1508)

Using a singleton makes tests non-deterministic. Instead use an instance
for each measurer.

Helps with ooni/probe#2677.
Zero out times and zero LTE measurement results not used by
minipipeline.

Helps with ooni/probe#2677.
It seems unnecessary in light of what we're currently testing. By doing
this, we allow for much smaller commits.

Tangentially related to ooni/probe#2677.
Arguably IDs starting from 10k for getaddrinfo more obviously have a
scope.

Related to ooni/probe#2677.
We just need a single IP address in both cases. Helps with
ooni/probe#2677.
This diff removes the remaining causes of churn in qatool (modulo flaky
tests).

Closes ooni/probe#2677.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants