Before installing starbake, ensure the following minimum versions are installed on your system:
- jdk: 11 or higher
- python: 3.8 or higher
- Install Starlake
sh <(curl https://raw.githubusercontent.com/starlake-ai/starlake/master/distrib/setup.sh) --target=.
- Create a virtual environment (optional)
python3 -m pip install virtualenv
python3 -m venv .venv
- Activate the virtual environment (optional)
source .venv/bin/activate
- Generate dummy files
python3 -m pip install -r _scripts/requirements.txt
python3 _scripts/dummy_data_generator.py
We're good to go
-
Import from incoming to pending
./starlake import
-
Load data to local dir as parquet
./starlake load
-
Run the transformations in order
./starlake transform --name Customers.CustomerLifetimeValue
./starlake transform --name Customers.HighValueCustomers
./starlake transform --name Products.ProductProfitability
./starlake transform --name Products.MostProfitableProducts
./starlake transform --name Products.ProductPerformance
./starlake transform --name Products.TopSellingProducts
./starlake transform --name Products.TopSellingProfitableProducts
- Run the transformations recursively
./starlake transform --name Customers.HighValueCustomers --recursive
./starlake transform --name Products.TopSellingProfitableProducts --recursive
- Install the dagster webserver
python3 -m pip install dagster-webserver
- Install the starlake dagster libraries for shell
python3 -m pip install 'starlake-dagster[shell]'
- Generate DAGs
./starlake dag-generate --clean
- Load the DAGs with dagster
DAGSTER_HOME=${PWD} dagster dev -f metadata/dags/generated/load/starbake.py -f metadata/dags/generated/transform/CustomerLifetimeValue.py -f metadata/dags/generated/transform/HighValueCustomers.py -f metadata/dags/generated/transform/ProductPerformance.py -f metadata/dags/generated/transform/ProductProfitability.py -f metadata/dags/generated/transform/MostProfitableProducts.py -f metadata/dags/generated/transform/TopSellingProducts.py -f metadata/dags/generated/transform/TopSellingProfitableProducts.py
- Browse
http://localhost:3000/locations