-
Notifications
You must be signed in to change notification settings - Fork 15
Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@nakazax please sign your commits - PR can't be merged until this condition is fulfilled
|
@alexott Thanks for your comment. I've added a verified signature to the commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds the sql2dbx tool to the databrickslabs/sandbox repository, which automates the conversion of SQL files into Databricks notebooks using an LLM-powered, prompt-based approach. Key changes include:
- Adding multiple notebooks for various SQL dialects (PostgreSQL, Oracle, Netezza, MySQL) that are auto-converted from SQL scripts.
- Implementing batch processing workflows and error handling patterns in each notebook for converting SQL into .py notebooks.
- Including sample YAML-based system prompts and documentation in the README to guide users.
Reviewed Changes
Copilot reviewed 71 out of 82 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
sql2dbx/examples/postgresql/output/postgresql_example2_stored_procedure.py | Creates a notebook with transaction-like logic using separate MERGE operations for backup and value capping. |
sql2dbx/examples/postgresql/output/postgresql_example1_multi_statement_transformation.py | Sets up a products table with discount-based price updates via MERGE and deletion rules. |
sql2dbx/examples/oracle/output/oracle_example2_stored_procedure.py | Implements stored procedure logic with parameter widgets and threshold updates using MERGE. |
sql2dbx/examples/oracle/output/oracle_example1_multi_statement_transformation.py | Demonstrates multi-statement transformations and discount operations on product data. |
sql2dbx/examples/netezza/output/netezza_example2_stored_procedure.py | Provides a stored procedure for adjusting thresholds with rollback simulation. |
sql2dbx/examples/netezza/output/netezza_example1_multi_statement_transformation.py | Deals with multi-statement data transformations and discount applications. |
sql2dbx/examples/mysql/output/mysql_example2_stored_procedure.py | Performs threshold checks and updates with rollback simulation in a MySQL context. |
sql2dbx/examples/mysql/output/mysql_example1_multi_statement_transformation.py | Implements multi-statement order transformations with discount adjustments and cleanup. |
sql2dbx/README.md | Offers documentation on tool usage and setup instructions for integrating sql2dbx with Databricks notebooks. |
Files not reviewed (11)
- sql2dbx/.gitignore: Language not supported
- sql2dbx/examples/mysql/input/mysql_example1_multi_statement_transformation.sql: Language not supported
- sql2dbx/examples/mysql/input/mysql_example2_stored_procedure.sql: Language not supported
- sql2dbx/examples/netezza/input/netezza_example1_multi_statement_transformation.sql: Language not supported
- sql2dbx/examples/netezza/input/netezza_example2_stored_procedure.sql: Language not supported
- sql2dbx/examples/oracle/input/oracle_example1_multi_statement_transformation.sql: Language not supported
- sql2dbx/examples/oracle/input/oracle_example2_stored_procedure.sql: Language not supported
- sql2dbx/examples/postgresql/input/postgresql_example1_multi_statement_transformation.sql: Language not supported
- sql2dbx/examples/postgresql/input/postgresql_example2_stored_procedure.sql: Language not supported
- sql2dbx/examples/redshift/input/redshift_example1_multi_statement_transformation.sql: Language not supported
- sql2dbx/examples/redshift/input/redshift_example2_stored_procedure.sql: Language not supported
Comments suppressed due to low confidence (1)
sql2dbx/examples/oracle/output/oracle_example2_stored_procedure.py:16
- [nitpick] Consider renaming 'p_multiplier' to 'multiplier' (or a similarly clear name) to align with the widget name and improve code clarity.
p_multiplier = float(dbutils.widgets.get("multiplier"))
# Update forecast table: first backup original values | ||
# Using MERGE instead of UPDATE FROM since Databricks doesn't support UPDATE FROM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider evaluating whether the two separate MERGE statements (one for backing up original values and another for capping forecast values) can be consolidated to reduce duplicate logic and improve maintainability, provided the business logic permits.
# Update forecast table: first backup original values | |
# Using MERGE instead of UPDATE FROM since Databricks doesn't support UPDATE FROM | |
# Update forecast table: backup original values and cap forecast values | |
# Using a single MERGE statement to reduce duplication |
Copilot uses AI. Check for mistakes.
Add sql2dbx tool to databrickslabs/sandbox
Overview
This PR adds sql2dbx to the databrickslabs/sandbox repository. sql2dbx is an automation tool designed to convert SQL files into Databricks notebooks. It leverages Large Language Models (LLMs) to perform the conversion based on system prompts tailored for various SQL dialects. sql2dbx consists of a series of Databricks notebooks.
Features
Sample SQL Dialect Prompts
The tool includes sample YAML-based conversion prompts for:
Each prompt file contains a system message and few-shot examples tailored to the specific SQL dialect's syntax and semantics.
Documentation
The main notebook (00_main) serves as the entry point with documentation on the conversion workflow and instructions for creating custom dialect prompts or extending the existing samples.