Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection for Dremio Lakehouse #427

Open
jdbodyfelt opened this issue Jul 29, 2024 · 5 comments
Open

Connection for Dremio Lakehouse #427

jdbodyfelt opened this issue Jul 29, 2024 · 5 comments

Comments

@jdbodyfelt
Copy link

jdbodyfelt commented Jul 29, 2024

Please Describe The Problem To Be Solved

The Problem: Other than Snowflake, there is a lack of connectors to other lakehouse solutions. While a Databricks connector would be nice for many corporate production runs, in the interest of open source, a Dremio connector might be more appreciated by the community. This request is to build a Dremio connector for Quary.

Optional: Suggest A Solution

Looking into the code architecture, it seems that the bulk of connectors are maintained within rust/quary-databases/src/databases_<flavor>.rs and rust/core/src/database_<flavor>.rs. Inspection shows a common class interface already designed across both. For Dremio, there are a number of protocols available including REST, JDBC, & ODBC. However, with a RUST build, it may be advantageous to use the ARROW Flight protocol as Dremio highly support it - can lead to 20X speed-up over JDBC. *In fact, could even extend this issue to a generic "Arrow Flight Connector" type.

A possible plan includes:

  1. Review full connection interface by building "skeleton" version of rust/quary-databases/src/databases_dremio.rs.
  2. Review Dremio docs, although may not be needed if just functional Arrow SQL.
  3. Get feedback on any other requirements to implement a Dremio connector - thought I saw something else in SQL interfacing code.
  4. Design for any other feedback (e.g. any other needed *_dremio.rs files).
  5. Unit test.
  6. Review & release into the wild.

Happy to help on this to build out my Rust expertise...

@benfdking
Copy link
Collaborator

Thanks for this! We'll have a quick look into this today!

@benfdking
Copy link
Collaborator

benfdking commented Aug 1, 2024

Here's a first draft #446, it doesn't work and still needs filling in quite a bit but I think it should give you the general structure.

There are a few things to add:

  • Proper integration tests could be done with dremio/dremio-oss
  • I don't know dremio but there are quite a few auth methods, so making sure we cover the ones I think will be ok
  • There seems to be a distinction in flight of read queries and write queries, which is a difference that may make things a little more complicated.

I mostly did it out of curiosity for:

  1. Dremio which is cool!
  2. ArrowFlight: We have some translation layers and I am wondering whether quary's internal format should just be arrow.

@jdbodyfelt
Copy link
Author

Love the idea of internal format of Arrow - it looks very sweet!

@jdbodyfelt
Copy link
Author

I'll pull the branch and try to get some feedback by Wed

@benfdking
Copy link
Collaborator

There's a first draft with this it in being pushed at the moment, it works with username/password/nossl

            let host = env::var("DREMIO_HOST")
                .map_err(|_| "DREMIO_HOST must be set to connect to Dremio".to_string())?;
            let port = env::var("DREMIO_PORT")
                .map_err(|_| "DREMIO_PORT must be set to connect to Dremio".to_string())?;
            let use_ssl = env::var("DREMIO_USE_SSL")
                .map_err(|_| "DREMIO_USE_SSL must be set to connect to Dremio".to_string())?;
            let username = env::var("DREMIO_USER")
                .map_err(|_| "DREMIO_USER must be set to connect to Dremio".to_string())?;
            let password = env::var("DREMIO_PASSWORD")
                .map_err(|_| "DREMIO_PASSWORD must be set to connect to Dremio".to_string())?;

            let auth = if let Ok(personal_access_token) = env::var("DREMIO_PERSONAL_ACCESS_TOKEN") {
                DremioAuth::UsernamePersonalAccessToken(username, personal_access_token)
            } else {
                DremioAuth::UsernamePassword(username, password)
            };

            let database = crate::databases_dremio::Dremio::new(
                config,
                auth,
                use_ssl.parse().unwrap(),
                host,
                port,
            )
            .await?;
            Ok(Box::new(database))

Outlines the variables you need: DREMIO_HOST, DREMIO_PORT, DREMIO_USE_SSL, DREMIO_USER, DREMIO_PASSWORD and they can be stored in .env file

DREMIO_HOST=localhost
DREMIO_PORT=32010
DREMIO_USE_SSL=false
DREMIO_USER=admin
DREMIO_PASSWORD=fht4jyx9HAY!jxk1ydg

Is what I got working locally for this "setup"
// It should be running on the following ports
// docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 -p 45678:45678 dremio/dremio-oss
// 1. Create test space
// 2. Create test folder inside the test space
// 3. Create the samples source

With the config

dremio:
  dremio_space: test
  dremio_space_folder: test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants