Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support running SQL models on Google Cloud Dataproc Serverless #1353

Closed
3 tasks done
gddezero opened this issue Sep 20, 2024 · 5 comments
Closed
3 tasks done
Labels
enhancement New feature or request

Comments

@gddezero
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Context

Google Cloud Dataproc Serverless lets you run Spark workloads without requiring you to provision and manage your own Dataproc cluster. Use the Google Cloud console, Google Cloud CLI, or Dataproc API to submit a batch workload to the Dataproc Serverless service. The service will run the workload on a managed compute infrastructure, autoscaling resources as needed.

Dataproc Serverless is widely used for GCP customers to build data pipelines. A typical use case is submitting Spark SQL jobs to Dataproc Serverless to transform data and build data warehouse.

Current Status

dbt only supports running Python models on Dataproc Serverless as a companion service of BigQuery
https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc

Request

Support running SQL models on Dataproc Serverless

Describe alternatives you've considered

No response

Who will this benefit?

Customers using Google Cloud

Are you interested in contributing this feature?

No response

Anything else?

No response

@gddezero gddezero added enhancement New feature or request triage labels Sep 20, 2024
@gddezero gddezero changed the title [Feature] Support Google Cloud Serverless [Feature] Support running SQL models on Google Cloud Dataproc Serverless Sep 20, 2024
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Sep 23, 2024
@amychen1776
Copy link

Hello @gddezero Could you provide more context about why you prefer Datapoc for SQL rather than directly on BQ?

@amychen1776
Copy link

At this time, we will not be prioritizing this work (and it wouldn't be done on the Bigquery adapter if it was) so I'm closing this issue for now.

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Oct 28, 2024
@gddezero
Copy link
Author

@amychen1776 With Serverless Spark, user does not need to deploy Spark cluster and thrift server. It will greatly reduce the infrastructure management efforts. I should create this feature request in dbt-spark rather than dbt-bigquery.

@amychen1776
Copy link

amychen1776 commented Oct 29, 2024

@gddezero that would make the most sense :) For dbt-spark, it will require supporting a new way of connection since we do expect a thrift server/ODBC driver/HTTP

@gddezero
Copy link
Author

@amychen1776 Thanks for your advice. I created a the feature request in dbt-spark: dbt-labs/dbt-spark#1131

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants