Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Flight SQL server support #6023

Open
wants to merge 91 commits into
base: main
Choose a base branch
from

Conversation

devinrsmith
Copy link
Member

@devinrsmith devinrsmith commented Sep 5, 2024

This adds Arrow Flight SQL support to the Deephaven server, with the goal of supporting various SQL drivers built on top of Flight SQL (JDBC, ADBC, ODBC). It is limited to query statements (ie, no UPDATEs). The implementation supports ad-hoc query statements and ad-hoc looking prepared query statements, but not parameterized prepared statements. Queries are able to reference tables from the global scope by name; we might be able to expand support to other resolvers in the future, potentially with catalogs and namespaces. The scope of supported queries is guided by our Sql engine io.deephaven.engine.sql.Sql which is based on a Calcite query parser.

It is important to note that while the Flight SQL implementation respects io.deephaven.server.session.TicketResolver.Authorization.transform it is not hooked up to io.deephaven.auth.ServiceAuthWiring nor io.deephaven.server.table.validation.ColumnExpressionValidator. So while Flight SQL does not allow users script execution, it does not limit the table operations a user may perform, nor does it restrict the calling of arbitrary functions from a formula. As such, the security posture of Flight SQL sits somewhere between "full access" and "limited access".

A cookie-based authentication extension has been added to support some Flight SQL clients which don't operate via the normal Flight authentication "Authorization" header (yes, it's a misnomer), and instead expect the server to send "Set-Cookie" to the clients, and for the clients to echo back the cookie(s) via the "Cookie" header. The server will only send the authentication token as a cookie when the client sends the header and value "x-deephaven-auth-cookie-request=true". To support this, the io.grpc.Context has been captured and preserved during the export submission logic.

The full Flight SQL action and command spectrum has been skeleton-ed out with appropriate "unimplemented" messages in anticipation that we will want to expand the scope of the Flight SQL implementation in the future. It also serves as a more explicit guide to readers of the code for what is and is not supported.

Fixes #5339

chipkent
chipkent previously approved these changes Sep 11, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py dir LGTM

@devinrsmith devinrsmith changed the title feat: Add FlightSQL server support feat: Add Flight SQL server support Oct 29, 2024
chipkent
chipkent previously approved these changes Oct 30, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

py dir LGTM.

Copy link
Member

@niloc132 niloc132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings in tests in java 11, will be failures in 17+:

> Task :extensions-flight-sql:jdbcTest
1 compiler directives added

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by cfjd.org.apache.calcite.avatica.com.google.protobuf.UnsafeUtil (file:/home/colin/.gradle/caches/modules-2/files-2.1/org.apache.arrow/flight-sql-jdbc-driver/13.0.0/179d3073933291f6b7b1584b56d8b97f0ae1f740/flight-sql-jdbc-driver-13.0.0.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of cfjd.org.apache.calcite.avatica.com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Comment on lines 10 to 14
compileClasspath += sourceSets.main.output
compileClasspath += sourceSets.test.output

runtimeClasspath += sourceSets.main.output
runtimeClasspath += sourceSets.test.output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this seems to be necessary - both in that jdbcTest doesnt depend on test at all (should be using the java-test-fixtures plugin to share like that), and that the remaining lines can be a oneliner dependency below:

    jdbcTestImplementation project(':extensions-flight-sql')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think java-test-fixture is necessarily the correct model... that said, I've divorced the jdbcTest source set from the test source set. There may be better ways to do this... but I think what I have is functionally correct. Also added some comments about the differences between test and jdbcTest.

Comment on lines +499 to +500
// This general interface does not make sense; resolution should always be done against a _ticket_. Nothing
// calls io.deephaven.server.session.TicketRouter.resolve(SessionState, FlightDescriptor, String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate/remove it? Did we previously have a plan for it that didn't pan out or otherwise was unnecessary/untenable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to remove it here or in follow-up, cc @nbauernfeind

Comment on lines 1163 to 1170
{
final QueryScope scope = ExecutionContext.getContext().getQueryScope();
try {
obj = scope.readParamValue(table);
} catch (QueryScope.MissingVariableException e) {
return false;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the nested block? The only local that gets scoped into here is, err, scope, is there a downside of it "escaping" into the method scope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been leaning into scoping things more explicitly; I find it helps structure code, esp code that is on the longer side. That said, pretty minor here, so will undo.

}
};
for (int i = 0; i < L; ++i) {
final char c = flightSqlPattern.charAt(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...in addition to never having underscores, does this mean tables can never have unicode, or are _ and % not allowed as a later code unit in a codepoint? My unicode memory is not retained well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least right now, the table names are limited to the sorts of names we allow in the query scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That includes oodles of unicode, no? https://stackoverflow.com/a/17043983/860630

* against, which is a common pattern used in Deephaven table names. As mentioned below, it also follows that an
* empty string should only explicitly match against an empty string.
*/
private static Predicate<String> flightSqlFilterPredicate(String flightSqlPattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth testing, so we can explicitly call out edge cases in the test code, show what isn't expected to work (empty never matches empty, underscore matches underscore and also other things, etc), plus be sure we don't have any other surprises lurking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explicit testing.

if (ticket.get() != TICKET_PREFIX) {
// If we get here, it means there is an error with FlightSqlResolver.ticketRoute /
// io.deephaven.server.session.TicketRouter.getResolver
throw new IllegalStateException("Could not resolve Flight SQL ticket '" + logId + "': invalid prefix");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to print the ticket itself too? e.g. ScopeTicketResolver.nameForTicket uses ByteHelper.byteBufToHex(ticket)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less inclined in this specific case, as it represents an internal server error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more generally, I'm less inclined with FlightSql tickets specifically b/c they are arbitrary length Any messages.


@Override
public Table of(TicketTable ticketTable) {
// This does not wrap in a nugget like TicketRouter.resolve; is that important?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a pre-merge todo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a TODO to myself, but has been handled by a higher layer nugget now. Will remove comment.

Comment on lines +15 to +17
compileOnlyApi(project(':util-thread')) {
because 'downstream dagger compile'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you - I ran into this the other day. Want me to submit this in its own PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too worried; if this merges in soon, should be ok here.

Comment on lines 281 to 282
throw e;
// return SessionState.wrapAsFailedExport(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link with a TODO to the issue?

if (!(obj instanceof Table)) {
return false;
}
return authorization.transform((Table) obj) != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but the call tree into here just to get a "true" or "false" back could make GetDbSchemas, GetTables, GetTableTypes etc quite expensive - and if the table is dropped at the end of the call when the liveness scope ends, the very next flightsql call will need to recompute it. Waiting seconds or minutes for each call and paying the same thing for the next call seems bad.

It sounds like Authorization might need something other than transform if we're going to use it this way? Transform is supposed to be the step just before using the table, not deciding if it can be used. Maybe a default method in the interface that just checks if the result of transform is null, and implementations can specify better behavior if they want to?

* - "_" means to match any one character.
* </pre>
*
* There does not seem to be any potential for escaping, which means that underscores can't explicitly be matched
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not meaning to be critical of the code itself (except the char-by-char thing, i think you want to iterate codepoints, or just scan next index of @/_), but the premise that flightsql is starting with. If tooling out there will use it and expects it to work a certain way, we should support it.

But this would have been so easy to represent for arbitrary tokens, so each lang/runtime could define its own quote and wildcard mechanism, something like

message Pattern {
  repeated Token tokens = 1;
}
message Token {
  oneof value {
    Wildcard wildcard = 1;
    string literal = 2;
  }
}
enum Wildcard {
  NOT_SET = 0;
  ONE = 1;
  AT_LEAST_ONE = 2;
  ANY = 3;
}

}
};
for (int i = 0; i < L; ++i) {
final char c = flightSqlPattern.charAt(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That includes oodles of unicode, no? https://stackoverflow.com/a/17043983/860630

Comment on lines 814 to 823
public final TicketHandlerReleasable initialize(C command) {
try {
return initializeImpl(command);
} catch (Throwable t) {
release();
throw t;
}
}

private synchronized QueryBase<C> initializeImpl(C command) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these called initialize when they wrap the actual call to execute() (granted, which itself has to delegate further to executeImpl)?

The naming seems to suggest that initialization happens before execution, but really initialization is execution (plus a guard for exceptions, calling more than once, and scheduling cleanup).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I iterated on different naming. I will update the interface to "execute".

* initialize.
*/
static abstract class CommandHandlerFixedBase<T extends Message> implements CommandHandler<T> {
private final Class<T> clazz;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field is unused - but we require it to be non-null, and require subtypes to pass a type that ostensibly matches it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

execute(command);
if (table == null) {
throw new IllegalStateException(
"QueryBase implementation has a bug, should have set table");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguable QueryBase should be getClass(), as it was the impl's execute() that failed to call executeImpl?

Also do we want to use Assert for these sorts of "we broke our own contract" assertions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to Assert.

}

private void onSessionClosed() {
log.debug().append("onSessionClosed: removing prepared statement ").append(handleId.toString()).endl();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicky, so ignore if you want (but I think this might be the kind of thing you'd flag if you spotted it):
This is a little gross since this is the utf8 of a uuid string (36 bytes), and we pay for the toString() of the ByteString whether or not we log the debug...

Technically I think handleId.asReadOnlyByteBuffer would at least not copy it every time, and the bytebuffer is holding ascii instead of the actual bytes of the uuid, so that would print nicely too.

But do we make handleId by design to hold ascii? Could use UuidCreator.toBytes(UUID) and have 16 real bytes instead of 8 smeared across the LSBs of 32ish bytes? At that point you might consider just using SecureRandom.nextBytes(16) since we don't actually use the uuid api at all past this point.

Also worth noting that the length-36 ByteStrings going to get hashed by multiplying each by 31 and adding the next... my math isn't great here, but packing so many zeros isn't going to make for nice hashes right. Won't matter except under load, but then it might matter if we are collision heavy.

Could easily wait until it does matter, but it seems cheap to either extract the bytes (or generate the bytes) and live with ByteString's toString and wrap in an if debug check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest push.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Arrow Flight SQL
5 participants