Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399

nakazax · 2025-04-25T09:42:57Z

Add sql2dbx tool to databrickslabs/sandbox

Overview

This PR adds sql2dbx to the databrickslabs/sandbox repository. sql2dbx is an automation tool designed to convert SQL files into Databricks notebooks. It leverages Large Language Models (LLMs) to perform the conversion based on system prompts tailored for various SQL dialects. sql2dbx consists of a series of Databricks notebooks.

Features

Batch processing workflow for SQL file conversion
Extensible prompt-based architecture for SQL dialect handling
LLM-powered conversion with syntax validation
Automatic error correction and cell splitting
Direct output as ready-to-use Databricks notebook files (.py format)
Support for multiple language models (Claude, Azure OpenAI, etc.)

Sample SQL Dialect Prompts

The tool includes sample YAML-based conversion prompts for:

T-SQL (SQL Server, Azure Synapse)
Oracle
Teradata
MySQL/MariaDB
PostgreSQL
Snowflake
Redshift
Netezza

Each prompt file contains a system message and few-shot examples tailored to the specific SQL dialect's syntax and semantics.

Documentation

The main notebook (00_main) serves as the entry point with documentation on the conversion workflow and instructions for creating custom dialect prompts or extending the existing samples.

alexott · 2025-04-25T17:42:43Z

@nakazax please sign your commits - PR can't be merged until this condition is fulfilled

Commits must have verified signatures.

nakazax · 2025-04-26T02:18:44Z

@alexott Thanks for your comment. I've added a verified signature to the commit.

Copilot

Pull Request Overview

This pull request adds the sql2dbx tool to the databrickslabs/sandbox repository, which automates the conversion of SQL files into Databricks notebooks using an LLM-powered, prompt-based approach. Key changes include:

Adding multiple notebooks for various SQL dialects (PostgreSQL, Oracle, Netezza, MySQL) that are auto-converted from SQL scripts.
Implementing batch processing workflows and error handling patterns in each notebook for converting SQL into .py notebooks.
Including sample YAML-based system prompts and documentation in the README to guide users.

Reviewed Changes

Copilot reviewed 71 out of 82 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
sql2dbx/examples/postgresql/output/postgresql_example2_stored_procedure.py	Creates a notebook with transaction-like logic using separate MERGE operations for backup and value capping.
sql2dbx/examples/postgresql/output/postgresql_example1_multi_statement_transformation.py	Sets up a products table with discount-based price updates via MERGE and deletion rules.
sql2dbx/examples/oracle/output/oracle_example2_stored_procedure.py	Implements stored procedure logic with parameter widgets and threshold updates using MERGE.
sql2dbx/examples/oracle/output/oracle_example1_multi_statement_transformation.py	Demonstrates multi-statement transformations and discount operations on product data.
sql2dbx/examples/netezza/output/netezza_example2_stored_procedure.py	Provides a stored procedure for adjusting thresholds with rollback simulation.
sql2dbx/examples/netezza/output/netezza_example1_multi_statement_transformation.py	Deals with multi-statement data transformations and discount applications.
sql2dbx/examples/mysql/output/mysql_example2_stored_procedure.py	Performs threshold checks and updates with rollback simulation in a MySQL context.
sql2dbx/examples/mysql/output/mysql_example1_multi_statement_transformation.py	Implements multi-statement order transformations with discount adjustments and cleanup.
sql2dbx/README.md	Offers documentation on tool usage and setup instructions for integrating sql2dbx with Databricks notebooks.

Files not reviewed (11)

sql2dbx/.gitignore: Language not supported
sql2dbx/examples/mysql/input/mysql_example1_multi_statement_transformation.sql: Language not supported
sql2dbx/examples/mysql/input/mysql_example2_stored_procedure.sql: Language not supported
sql2dbx/examples/netezza/input/netezza_example1_multi_statement_transformation.sql: Language not supported
sql2dbx/examples/netezza/input/netezza_example2_stored_procedure.sql: Language not supported
sql2dbx/examples/oracle/input/oracle_example1_multi_statement_transformation.sql: Language not supported
sql2dbx/examples/oracle/input/oracle_example2_stored_procedure.sql: Language not supported
sql2dbx/examples/postgresql/input/postgresql_example1_multi_statement_transformation.sql: Language not supported
sql2dbx/examples/postgresql/input/postgresql_example2_stored_procedure.sql: Language not supported
sql2dbx/examples/redshift/input/redshift_example1_multi_statement_transformation.sql: Language not supported
sql2dbx/examples/redshift/input/redshift_example2_stored_procedure.sql: Language not supported

Comments suppressed due to low confidence (1)

sql2dbx/examples/oracle/output/oracle_example2_stored_procedure.py:16

[nitpick] Consider renaming 'p_multiplier' to 'multiplier' (or a similarly clear name) to align with the widget name and improve code clarity.

p_multiplier = float(dbutils.widgets.get("multiplier"))

Copilot · 2025-05-07T11:33:47Z

sql2dbx/examples/postgresql/output/postgresql_example2_stored_procedure.py

+    # Update forecast table: first backup original values
+    # Using MERGE instead of UPDATE FROM since Databricks doesn't support UPDATE FROM


[nitpick] Consider evaluating whether the two separate MERGE statements (one for backing up original values and another for capping forecast values) can be consolidated to reduce duplicate logic and improve maintainability, provided the business logic permits.

Suggested change

# Update forecast table: first backup original values

# Using MERGE instead of UPDATE FROM since Databricks doesn't support UPDATE FROM

# Update forecast table: backup original values and cap forecast values

# Using a single MERGE statement to reduce duplication

nakazax requested a review from a team as a code owner April 25, 2025 09:42

nakazax requested a review from grusin-db April 25, 2025 09:42

Add sql2dbx: LLM-powered SQL to Databricks notebook converter

b9c497e

nakazax force-pushed the main branch from 1da7311 to b9c497e Compare April 26, 2025 02:14

alexott requested a review from Copilot May 7, 2025 11:33

Copilot AI reviewed May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399

Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399

nakazax commented Apr 25, 2025

alexott commented Apr 25, 2025

nakazax commented Apr 26, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI May 7, 2025

		# Update forecast table: first backup original values
		# Using MERGE instead of UPDATE FROM since Databricks doesn't support UPDATE FROM

Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399

Are you sure you want to change the base?

Add sql2dbx: LLM-powered SQL to Databricks notebook converter #399

Conversation

nakazax commented Apr 25, 2025

Add sql2dbx tool to databrickslabs/sandbox

Overview

Features

Sample SQL Dialect Prompts

Documentation

alexott commented Apr 25, 2025

nakazax commented Apr 26, 2025 • edited Loading

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Copilot AI May 7, 2025

Choose a reason for hiding this comment

nakazax commented Apr 26, 2025 •

edited

Loading