This repo presents a minimal setup to read Parquet
, JSON
, and CSV
files with Apache Superset. Using an in-memory DuckDB database, a live data connection is made between Superset and a filesystem.
Execute the steps below to setup a local Apache Superset instance—with DuckDB support—using Docker.
docker build -t jorritsandbrink/superset-duckdb docker
docker run -d -p 8080:8088 \
-e "SUPERSET_SECRET_KEY=your_secret_key" \
--mount type=bind,source=/$(pwd)/data,target=/data \
--name superset-duckdb \
jorritsandbrink/superset-duckdb
Note: the local
/data
folder is mounted to make the data files accessible from within the container.
./docker/setup.sh
This includes creating an admin user and configuring a DuckDB database connection.
Go to http://localhost:8080/login/ and login with username=admin
and password=admin
.
Go to Database Connections (http://localhost:8080/databaseview/list/) to validate the database connection has been created:
Click the Edit button to see the connection details:
SQLAlchemy URI:
duckdb:///:memory:
Click TEST CONNECTION
and make sure you see this popup message:
Go to SQL Lab (http://localhost:8080/sqllab/) to query Parquet
, JSON
, or CSV
, files as follows:
The queries use a glob syntax to read multiple files as documented on https://duckdb.org/docs/data/multiple_files/overview.html.
SELECT *
FROM '/data/parquet_table/*.parquet'
SELECT *
FROM '/data/json_table/*.json'
SELECT *
FROM '/data/csv_table/*.csv'