Sonnet Scripts is a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.
One of the challenges of making content and tutorials on data is the lack of established data infrastructure and real-world datasets. I have often found myself repeating this process over and over again, therefore we decided to create an open-source repo to expedite this process.
According to the Academy of American Poets, a "...sonnet is a fourteen-line poem written in iambic pentameter, employing one of several rhyme schemes, and adhering to a tightly structured thematic organization." Through the constraints of a particular sonnet format, poets throughout centuries have pushed their creativity to express themselves-- William Shakespear being one of the most well-known. I've similarly seen data architectures fill the same role as a sonnet, where their specific patterns push data practioners to think of creative ways to solve business problems.
Welcome to Sonnet Scripts – a fully containerized environment designed for data analysts, analytics engineers, and data engineers to experiment with databases, queries, and ETL pipelines. This repository provides a pre-configured sandbox where users can ingest data, transform it using SQL/Python, and test integrations with PostgreSQL, DuckDB, MinIO and more!
This project is ideal for:
- Data Engineers who want a lightweight environment for testing data pipelines.
- Analytics Engineers experimenting with dbt and SQL transformations.
- Data Analysts looking for a structured PostgreSQL + DuckDB setup.
- Developers working on data APIs using Python.
Before setting up the environment, ensure you have the following installed:
-
Docker & Docker Compose
-
Make (for automation)
- Linux/macOS: Comes pre-installed
- Windows: Install via Chocolatey →
choco install make
-
Python (3.12+)
git clone https://github.com/onthemarkdata/sonnet-scripts.git
cd sonnet-scrips
make setup
This will:
- Build the Docker images
- Start the PostgreSQL, DuckDB, and other containers
- Ensure dependencies are installed
make load-db
make verify-db
make test
make exec-pythonbase
make exec-postgres
make exec-duckdb
📂 sonnet-scripts
│── 📂 pythonbase/ # Python-based processing container
│── 📂 linuxbase/ # Base container for Linux dependencies
│── 🐳 docker-compose.yml # Container orchestration
│── 🛠 Makefile # Automation commands
│── 📜 README.md # You are here!
Github Actions automates builds, test, and environment validation. The pipeline:
- Builds Docker images (
pythonbase
,linuxbase
) - Starts all services using
docker compose
- Runs unit & integration tests (
make test
) - Shuts down containers after test pass.
- Push to
main
orfeature/*
- Pull Requests to
main
Want to improve Sonnet Scripts? Here's how:
- Fork the repository
- Make your changes and test them locally.
- Submit a pull request (PR) for review.
For major changes, please open an issue first to discuss your proposal.
If you have any questions, feel free to open an issue or reach out! 🚀 Happy data wrangling!