Skip to content

Read Parquet, JSON, and CSV files in Apache Superset using DuckDB.

Notifications You must be signed in to change notification settings

jorritsandbrink/superset-duckdb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Superset – DuckDB

This repo presents a minimal setup to read Parquet, JSON, and CSV files with Apache Superset. Using an in-memory DuckDB database, a live data connection is made between Superset and a filesystem.

Custom Docker image

Execute the steps below to setup a local Apache Superset instance—with DuckDB support—using Docker.

Build the image

docker build -t jorritsandbrink/superset-duckdb docker

Run the container

docker run -d -p 8080:8088 \
    -e "SUPERSET_SECRET_KEY=your_secret_key" \
    --mount type=bind,source=/$(pwd)/data,target=/data \
    --name superset-duckdb \
    jorritsandbrink/superset-duckdb

Note: the local /data folder is mounted to make the data files accessible from within the container.

Setup Superset

./docker/setup.sh

This includes creating an admin user and configuring a DuckDB database connection.

Navigate to UI

Go to http://localhost:8080/login/ and login with username=admin and password=admin.

Check database connection

Go to Database Connections (http://localhost:8080/databaseview/list/) to validate the database connection has been created:

Overview of database connections in Superset UI

Click the Edit button to see the connection details:

DuckDB database connection configuration in Superset UI

SQLAlchemy URI:

duckdb:///:memory:

Click TEST CONNECTION and make sure you see this popup message:

Popup message indicating a good connection

Querying files from Superset using DuckDB

Go to SQL Lab (http://localhost:8080/sqllab/) to query Parquet, JSON, or CSV, files as follows:

Apache Superset DuckDB SQL Lab

The queries use a glob syntax to read multiple files as documented on https://duckdb.org/docs/data/multiple_files/overview.html.

Parquet

SELECT *
FROM '/data/parquet_table/*.parquet'

JSON

SELECT *
FROM '/data/json_table/*.json'

CSV

SELECT *
FROM '/data/csv_table/*.csv'

References

About

Read Parquet, JSON, and CSV files in Apache Superset using DuckDB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published