You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-spark functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Implement Livy method to connect Livy server to submit Spark SQL and read results for SQL models.
Also, dbt-spark create same number of connection to Spark Server based on number of thread configured threads: 1 (https://docs.getdbt.com/docs/running-a-dbt-project/using-threads) to run number of models simultaneously same as number of thread. But if we use connection to Spark on Kubernetes, Kubernetes will launch same number of driver pods based on thread which is not desirable as each pod (driver/executor) will use separate resource for each models.
We should have a support to use single connection for all thread, so that same spark driver can be used to submit all SQL models.
Describe alternatives you've considered
Currently Cloudera implemented dbt-spark adapter for Livy connection as dbt-spark-livy. but this implementation is developed with dbt-code version 1.3.1 (version) (setup.py) which does not have Iceberg format support.
So we referenced the Livy code developed by Cloudera and updated dbt-core version.
Who will this benefit?
All users who want to use Livy connection and if Livy server is deployed on Kubernetes, user can benefit single connection support for all dbt-core threads.
Are you interested in contributing this feature?
Yes, Based on Cloudera implementation, Livy support is added to dbt-spark adapter and also Locking is implemented for sharing connection for all threads.
Is this your first time submitting a feature request?
Describe the feature
Implement Livy method to connect Livy server to submit Spark SQL and read results for SQL models.
Also, dbt-spark create same number of connection to Spark Server based on number of thread configured threads: 1 (https://docs.getdbt.com/docs/running-a-dbt-project/using-threads) to run number of models simultaneously same as number of thread. But if we use connection to Spark on Kubernetes, Kubernetes will launch same number of driver pods based on thread which is not desirable as each pod (driver/executor) will use separate resource for each models.
We should have a support to use single connection for all thread, so that same spark driver can be used to submit all SQL models.
Describe alternatives you've considered
Currently Cloudera implemented dbt-spark adapter for Livy connection as dbt-spark-livy. but this implementation is developed with dbt-code version 1.3.1 (version) (setup.py) which does not have Iceberg format support.
So we referenced the Livy code developed by Cloudera and updated dbt-core version.
Who will this benefit?
All users who want to use Livy connection and if Livy server is deployed on Kubernetes, user can benefit single connection support for all dbt-core threads.
Are you interested in contributing this feature?
Yes, Based on Cloudera implementation, Livy support is added to dbt-spark adapter and also Locking is implemented for sharing connection for all threads.
Anything else?
Created below PR for this implementation:
#984
The text was updated successfully, but these errors were encountered: