Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talk to ClickHouse over the native protocol #6584

Merged
merged 6 commits into from
Sep 20, 2024
Merged

Conversation

bnaecker
Copy link
Collaborator

Adds a module in oximeter_db that speaks the native ClickHouse protocol, a message-oriented protocol directly over TCP. This includes the core messaging and serde needed for making a handshake with the server, and then running most standard SQL queries.

This uses the codecs from tokio_util to turn the TCP stream into a sink / stream of messges, which is much more convenient to operate on. The client and server exchange a handshake, after which the client may run SQL queries to select data.

There are a number of tests in the module, including low-level serialization checks and asserting the results of actual SQL queries. In addition, this adds a barebones SQL shell -- this is not intended to replace the feature-rich official CLI, but is very useful for testing the protocol implementation by running arbitrary queries.

This module is not consumed anywhere beyond the shell itself. A follow-up commit will integrate it into the existing oximeter_db::Client object, with the primary goal of making OxQL queries much more flexible and efficient.

Adds a module in `oximeter_db` that speaks the native ClickHouse
protocol, a message-oriented protocol directly over TCP. This includes
the core messaging and serde needed for making a handshake with the
server, and then running most standard SQL queries.

This uses the codecs from `tokio_util` to turn the TCP stream into a
sink / stream of messges, which is much more convenient to operate on.
The client and server exchange a handshake, after which the client may
run SQL queries to select data.

There are a number of tests in the module, including low-level
serialization checks and asserting the results of actual SQL queries. In
addition, this adds a barebones SQL shell -- this is not intended to
replace the feature-rich official CLI, but is very useful for testing
the protocol implementation by running arbitrary queries.

This module is not consumed anywhere beyond the shell itself. A
follow-up commit will integrate it into the existing
`oximeter_db::Client` object, with the primary goal of making OxQL
queries much more flexible and efficient.
oximeter/db/src/native/io/packet/server.rs Outdated Show resolved Hide resolved
oximeter/db/src/native/block.rs Show resolved Hide resolved
oximeter/db/src/native/block.rs Outdated Show resolved Hide resolved
oximeter/db/src/native/block.rs Show resolved Hide resolved
oximeter/db/src/native/block.rs Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Ok, I've done a bunch more testing myself, and also added a benchmark in 79533ff. That runs a query that is limited almost entirely by serialization time: SELECT number FROM system.numbers LIMIT <N>, for a range of N. I ran this with a simple version of the JSON-over-HTTP client that we already use (the query interface is the same), and compared that with the new native connection interface. Here are the results:

Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 100: Collecting 100 samples in estimated 6.0744 s
json-over-http/SELECT number FROM system.numbers LIMIT 100
                        time:   [382.44 µs 385.30 µs 388.20 µs]
                        change: [-5.9751% -3.1118% -1.0529%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 1000: Collecting 100 samples in estimated 5.2961 s
json-over-http/SELECT number FROM system.numbers LIMIT 1000
                        time:   [525.32 µs 530.20 µs 538.24 µs]
                        change: [-4.6887% -3.6179% -2.5775%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 10000: Collecting 100 samples in estimated 5.1763
json-over-http/SELECT number FROM system.numbers LIMIT 10000
                        time:   [2.3776 ms 2.3828 ms 2.3887 ms]
                        change: [-7.3186% -5.4842% -3.7319%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 100000: Collecting 100 samples in estimated 7.3365
json-over-http/SELECT number FROM system.numbers LIMIT 100000
                        time:   [24.549 ms 24.651 ms 24.760 ms]
                        change: [-1.6598% -0.8056% -0.0573%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 22.8s, or reduce sample count to 20.
Benchmarking json-over-http/SELECT number FROM system.numbers LIMIT 1000000: Collecting 100 samples in estimated 22.84
json-over-http/SELECT number FROM system.numbers LIMIT 1000000
                        time:   [237.46 ms 238.68 ms 239.93 ms]
                        change: [-4.7021% -3.7840% -2.8389%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Benchmarking native/SELECT number FROM system.numbers LIMIT 100: Collecting 100 samples in estimated 5.1390 s (25k ite
native/SELECT number FROM system.numbers LIMIT 100
                        time:   [199.15 µs 199.83 µs 200.68 µs]
                        change: [-0.7099% +0.1300% +1.0113%] (p = 0.82 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Benchmarking native/SELECT number FROM system.numbers LIMIT 1000: Collecting 100 samples in estimated 5.0935 s (25k it
native/SELECT number FROM system.numbers LIMIT 1000
                        time:   [199.58 µs 199.96 µs 200.39 µs]
                        change: [-12.142% -8.8727% -6.0084%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
Benchmarking native/SELECT number FROM system.numbers LIMIT 10000: Collecting 100 samples in estimated 5.3486 s (25k i
native/SELECT number FROM system.numbers LIMIT 10000
                        time:   [208.91 µs 209.56 µs 210.20 µs]
                        change: [-1.1181% -0.3457% +0.3577%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking native/SELECT number FROM system.numbers LIMIT 100000: Collecting 100 samples in estimated 6.4985 s (20k
native/SELECT number FROM system.numbers LIMIT 100000
                        time:   [317.84 µs 318.51 µs 319.29 µs]
                        change: [-6.6129% -4.8696% -3.3060%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
Benchmarking native/SELECT number FROM system.numbers LIMIT 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
Benchmarking native/SELECT number FROM system.numbers LIMIT 1000000: Collecting 100 samples in estimated 5.9062 s (505
native/SELECT number FROM system.numbers LIMIT 1000000
                        time:   [1.0681 ms 1.0778 ms 1.0881 ms]
                        change: [-3.3796% -1.5962% +0.1441%] (p = 0.07 > 0.05)
                        No change in performance detected.

The performance is actually closer than I expected for small queries, about a factor of 2. But the gap gets substantially larger for bigger queries, topping out at about a factor of 200 when selecting 1M elements.

@bnaecker
Copy link
Collaborator Author

Test failed here. There's an existing server listening on the TCP port, which makes sense now that I think about it. We're randomizing the HTTP port, but always listening for TCP connections on the default port of 900.

I'm going to change the test ClickHouseInstance object to also accept an optional TCP port, and pull it out of the log files if the OS chooses for us.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnaecker I gave this a fairly quick read through, concentrating on parts I thought warranted it. The code is easy enough to follow and low risk to merge so I'm a +1.

This is an epic undertaking and I really appreciate all the effort you went through to decipher their barely documented native protocol and implement it for us. It will be much more efficient to use than the HTTP interface. The only real downside is that we'll likely have to keep up with any undocumented changes. We're gonna want to smoke test any new clickhouse versions before we upgrade once we start using this heavily.

oximeter/db/src/shells/native.rs Show resolved Hide resolved
oximeter/db/src/native/block.rs Outdated Show resolved Hide resolved
oximeter/db/src/native/block.rs Show resolved Hide resolved
oximeter/db/src/native/connection.rs Show resolved Hide resolved
oximeter/db/src/native/connection.rs Outdated Show resolved Hide resolved
oximeter/db/src/native/connection.rs Outdated Show resolved Hide resolved
}

// Chop the prefix out of the source buffer.
*$src = rest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comments above!. I read the documentation for the called methods and I believe everything you did above looks correct. However, I'm not quite sure about this line. I think it's correct, because I don't see how you could write this type of code without doing it this way, but I'm not sure. Since it's a borrow and a pointer, I assume that the owned type will free all the memory since it knows when the original borrow goes out of scope, and that it won't just free some of it, or free it at the wrong time. I'd have someone like @hawkw or @cbiffle take a look, since they have expertise with unsafe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew! To clarify about this line -- src has type &mut &[u8], so it's a mutable reference to an immutable slice of bytes. What we're doing here is effectively moving the byte slice that src points to, so that it now points to the first byte after the memcpyd data. That's what rest is assigned to a few lines above.

Since all these are borrows onto the bytes, I don't think anything is dropped at all when they go out of scope. The references are dropped, but they can't drop anything since they don't own anything.

I would certainly love as many eyes on this as required to prove that, though :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that type out. That all makes sense to me.

oximeter/db/src/native/mod.rs Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

Thanks for the review Andrew! I take your point about keeping up with undocumented changes in the server. One blessing is that the server code does seem to be pretty careful about checking the client's advertised protocol value whenever it sends data. It only pushes things if the client says they support them.

I have not added any support for a range of versions here. Instead, we just bail out of the handshake if we detect a server that is not exactly what Omicron runs today. I do think we should probably track this interface in the table that Dave started writing up. If we do change the server versions, we will need to add that support in here, including possibly supporting a version range. But in the meantime, any of the tests that use this interface will fail loudly if we update ClickHouse today.

@andrewjstone
Copy link
Contributor

Thanks for the review Andrew! I take your point about keeping up with undocumented changes in the server. One blessing is that the server code does seem to be pretty careful about checking the client's advertised protocol value whenever it sends data. It only pushes things if the client says they support them.

Oh that's good to know! Thanks.

@bnaecker
Copy link
Collaborator Author

I'm going to rebase this on #6603 when that's merged, which should fix the test flake here.

- Better panic message
- Remove unused data from query method
- Add negative tests for DataType parsing
- Fill out `Connection::cancel()` and make it private, but add tests
  proving that it works as intended.
@bnaecker bnaecker merged commit ade19ee into main Sep 20, 2024
18 checks passed
@bnaecker bnaecker deleted the clickhouse-native branch September 20, 2024 23:02
hawkw pushed a commit that referenced this pull request Sep 21, 2024
Adds a module in `oximeter_db` that speaks the native ClickHouse
protocol, a message-oriented protocol directly over TCP. This includes
the core messaging and serde needed for making a handshake with the
server, and then running most standard SQL queries.

This uses the codecs from `tokio_util` to turn the TCP stream into a
sink / stream of messges, which is much more convenient to operate on.
The client and server exchange a handshake, after which the client may
run SQL queries to select data.

There are a number of tests in the module, including low-level
serialization checks and asserting the results of actual SQL queries. In
addition, this adds a barebones SQL shell -- this is not intended to
replace the feature-rich official CLI, but is very useful for testing
the protocol implementation by running arbitrary queries.

This module is not consumed anywhere beyond the shell itself. A
follow-up commit will integrate it into the existing
`oximeter_db::Client` object, with the primary goal of making OxQL
queries much more flexible and efficient.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants