-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Talk to ClickHouse over the native protocol #6584
Conversation
Adds a module in `oximeter_db` that speaks the native ClickHouse protocol, a message-oriented protocol directly over TCP. This includes the core messaging and serde needed for making a handshake with the server, and then running most standard SQL queries. This uses the codecs from `tokio_util` to turn the TCP stream into a sink / stream of messges, which is much more convenient to operate on. The client and server exchange a handshake, after which the client may run SQL queries to select data. There are a number of tests in the module, including low-level serialization checks and asserting the results of actual SQL queries. In addition, this adds a barebones SQL shell -- this is not intended to replace the feature-rich official CLI, but is very useful for testing the protocol implementation by running arbitrary queries. This module is not consumed anywhere beyond the shell itself. A follow-up commit will integrate it into the existing `oximeter_db::Client` object, with the primary goal of making OxQL queries much more flexible and efficient.
d1c894e
to
d94bb47
Compare
Ok, I've done a bunch more testing myself, and also added a benchmark in 79533ff. That runs a query that is limited almost entirely by serialization time:
The performance is actually closer than I expected for small queries, about a factor of 2. But the gap gets substantially larger for bigger queries, topping out at about a factor of 200 when selecting 1M elements. |
Test failed here. There's an existing server listening on the TCP port, which makes sense now that I think about it. We're randomizing the HTTP port, but always listening for TCP connections on the default port of 900. I'm going to change the test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnaecker I gave this a fairly quick read through, concentrating on parts I thought warranted it. The code is easy enough to follow and low risk to merge so I'm a +1.
This is an epic undertaking and I really appreciate all the effort you went through to decipher their barely documented native protocol and implement it for us. It will be much more efficient to use than the HTTP interface. The only real downside is that we'll likely have to keep up with any undocumented changes. We're gonna want to smoke test any new clickhouse versions before we upgrade once we start using this heavily.
} | ||
|
||
// Chop the prefix out of the source buffer. | ||
*$src = rest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comments above!. I read the documentation for the called methods and I believe everything you did above looks correct. However, I'm not quite sure about this line. I think it's correct, because I don't see how you could write this type of code without doing it this way, but I'm not sure. Since it's a borrow and a pointer, I assume that the owned type will free all the memory since it knows when the original borrow goes out of scope, and that it won't just free some of it, or free it at the wrong time. I'd have someone like @hawkw or @cbiffle take a look, since they have expertise with unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrew! To clarify about this line -- src
has type &mut &[u8]
, so it's a mutable reference to an immutable slice of bytes. What we're doing here is effectively moving the byte slice that src
points to, so that it now points to the first byte after the memcpy
d data. That's what rest
is assigned to a few lines above.
Since all these are borrows onto the bytes, I don't think anything is dropped at all when they go out of scope. The references are dropped, but they can't drop anything since they don't own anything.
I would certainly love as many eyes on this as required to prove that, though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that type out. That all makes sense to me.
Thanks for the review Andrew! I take your point about keeping up with undocumented changes in the server. One blessing is that the server code does seem to be pretty careful about checking the client's advertised protocol value whenever it sends data. It only pushes things if the client says they support them. I have not added any support for a range of versions here. Instead, we just bail out of the handshake if we detect a server that is not exactly what Omicron runs today. I do think we should probably track this interface in the table that Dave started writing up. If we do change the server versions, we will need to add that support in here, including possibly supporting a version range. But in the meantime, any of the tests that use this interface will fail loudly if we update ClickHouse today. |
Oh that's good to know! Thanks. |
I'm going to rebase this on #6603 when that's merged, which should fix the test flake here. |
- Better panic message - Remove unused data from query method - Add negative tests for DataType parsing - Fill out `Connection::cancel()` and make it private, but add tests proving that it works as intended.
Adds a module in `oximeter_db` that speaks the native ClickHouse protocol, a message-oriented protocol directly over TCP. This includes the core messaging and serde needed for making a handshake with the server, and then running most standard SQL queries. This uses the codecs from `tokio_util` to turn the TCP stream into a sink / stream of messges, which is much more convenient to operate on. The client and server exchange a handshake, after which the client may run SQL queries to select data. There are a number of tests in the module, including low-level serialization checks and asserting the results of actual SQL queries. In addition, this adds a barebones SQL shell -- this is not intended to replace the feature-rich official CLI, but is very useful for testing the protocol implementation by running arbitrary queries. This module is not consumed anywhere beyond the shell itself. A follow-up commit will integrate it into the existing `oximeter_db::Client` object, with the primary goal of making OxQL queries much more flexible and efficient.
Adds a module in
oximeter_db
that speaks the native ClickHouse protocol, a message-oriented protocol directly over TCP. This includes the core messaging and serde needed for making a handshake with the server, and then running most standard SQL queries.This uses the codecs from
tokio_util
to turn the TCP stream into a sink / stream of messges, which is much more convenient to operate on. The client and server exchange a handshake, after which the client may run SQL queries to select data.There are a number of tests in the module, including low-level serialization checks and asserting the results of actual SQL queries. In addition, this adds a barebones SQL shell -- this is not intended to replace the feature-rich official CLI, but is very useful for testing the protocol implementation by running arbitrary queries.
This module is not consumed anywhere beyond the shell itself. A follow-up commit will integrate it into the existing
oximeter_db::Client
object, with the primary goal of making OxQL queries much more flexible and efficient.