-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update brickflow example with brickflow 0.10.0 * update examples readme file * include create notebooks dir command --------- Co-authored-by: pariksheet <[email protected]>
- Loading branch information
1 parent
273fe7e
commit 9d960df
Showing
8 changed files
with
285 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# DO NOT MODIFY THIS FILE - IT IS AUTO GENERATED BY BRICKFLOW AND RESERVED FOR FUTURE USAGE | ||
projects: | ||
brickflow-demo: | ||
brickflow_version: auto | ||
deployment_mode: bundle | ||
enable_plugins: true | ||
name: brickflow-demo | ||
path_from_repo_root_to_project_root: . | ||
path_project_root_to_workflows_dir: workflows | ||
version: v1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,124 @@ | ||
# brickflow-examples | ||
This repository consists of examples for brickflow | ||
|
||
## Getting Started | ||
|
||
### Prerequisites | ||
1.Install brickflows | ||
|
||
```shell | ||
pip install brickflows | ||
``` | ||
|
||
2.Install [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) | ||
|
||
```shell | ||
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sudo sh | ||
``` | ||
|
||
3.Configure Databricks cli with workspace token. This configures your `~/.databrickscfg` file. | ||
|
||
```shell | ||
databricks configure --token | ||
``` | ||
|
||
### Clone the repository | ||
|
||
```shell | ||
git clone https://github.com/Nike-Inc/brickflow.git | ||
cd brickflow/examples/brickflow_examples | ||
``` | ||
|
||
### Hello World workflow | ||
- Create your first workflow using brickflow | ||
- Create a new file hello_world_workflow.py in the workflows directory | ||
- Add the following code to the file | ||
```python | ||
from brickflow import ( | ||
Cluster, | ||
Workflow, | ||
NotebookTask, | ||
) | ||
from brickflow.context import ctx | ||
from airflow.operators.bash import BashOperator | ||
|
||
|
||
cluster = Cluster( | ||
name="job_cluster", | ||
node_type_id="m6gd.xlarge", | ||
spark_version="13.3.x-scala2.12", | ||
min_workers=1, | ||
max_workers=2, | ||
) | ||
|
||
wf = Workflow( | ||
"hello_world_workflow", | ||
default_cluster=cluster, | ||
tags={ | ||
"product_id": "brickflow_demo", | ||
}, | ||
common_task_parameters={ | ||
"catalog": "<uc-catalog-name>", | ||
"database": "<uc-schema-name>", | ||
}, | ||
) | ||
|
||
@wf.task | ||
# this task does nothing but explains the use of context object | ||
def start(): | ||
print(f"Environment: {ctx.env}") | ||
|
||
@wf.notebook_task | ||
# this task runs a databricks notebook | ||
def example_notebook(): | ||
return NotebookTask( | ||
notebook_path="notebooks/example_notebook.py", | ||
base_parameters={ | ||
"some_parameter": "some_value", # in the notebook access these via dbutils.widgets.get("some_parameter") | ||
}, | ||
) | ||
|
||
|
||
@wf.task(depends_on=[start, example_notebook]) | ||
# this task runs a bash command | ||
def list_lending_club_data_files(): | ||
return BashOperator( | ||
task_id=list_lending_club_data_files.__name__, | ||
bash_command="ls -lrt /dbfs/databricks-datasets/samples/lending_club/parquet/", | ||
) | ||
|
||
@wf.task(depends_on=list_lending_club_data_files) | ||
# this task runs the pyspark code | ||
def lending_data_ingest(): | ||
ctx.spark.sql( | ||
f""" | ||
CREATE TABLE IF NOT EXISTS | ||
{ctx.dbutils_widget_get_or_else(key="catalog", debug="development")}.\ | ||
{ctx.dbutils_widget_get_or_else(key="database", debug="dummy_database")}.\ | ||
{ctx.dbutils_widget_get_or_else(key="brickflow_env", debug="local")}_lending_data_ingest | ||
USING DELTA -- this is default just for explicit purpose | ||
SELECT * FROM parquet.`dbfs:/databricks-datasets/samples/lending_club/parquet/` | ||
""" | ||
) | ||
``` | ||
_Note: Modify the values of catalog/database for common_task_parameters._ | ||
|
||
### Update demo_wf.py | ||
- demo_wf.py explains the various tasks and options available for the tasks | ||
- You can remove the demo_wf.py in case you just to run the hello_world_workflow.py | ||
- In case you want to run the demo_wf.py, update the below params with your values | ||
- default_cluster | ||
- common_task_parameters | ||
- permissions | ||
- default_task_settings | ||
|
||
### Deploy the workflow to databricks | ||
```shell | ||
brickflow projects deploy --project brickflow-demo -e local | ||
``` | ||
|
||
### Run the demo workflow | ||
- login to databricks workspace | ||
- go to the workflows and select the workflow | ||
![img.png](../../docs/img/workflow.png) | ||
- click on the run button |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
project_roots: | ||
brickflow-demo: | ||
root_yaml_rel_path: . | ||
version: v1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
import resolver | ||
from datetime import timedelta | ||
|
||
from airflow.operators.bash import BashOperator | ||
|
@@ -17,22 +16,25 @@ | |
|
||
wf = Workflow( | ||
"brickflow-demo", | ||
default_cluster=Cluster.from_existing_cluster("YOUR_CLUSTER_ID"), | ||
# replace <all-purpose-cluster-id> with your cluster id | ||
default_cluster=Cluster.from_existing_cluster("<all-purpose-cluster-id>"), | ||
# Optional parameters below | ||
schedule_quartz_expression="0 0/20 0 ? * * *", | ||
tags={ | ||
"product_id": "brickflow_demo", | ||
"slack_channel": "YOUR_SLACK_CHANNEL", | ||
}, | ||
common_task_parameters={ | ||
"catalog": "development", | ||
"database": "your_database", | ||
"catalog": "<unity-catalog-name>", | ||
"database": "<unity-schema-name>", | ||
}, | ||
# replace <emails> with existing users' email on databricks | ||
permissions=WorkflowPermissions( | ||
can_manage_run=[User("[email protected]"), User("[email protected]")], | ||
can_view=[User("[email protected]")], | ||
can_manage=[User("[email protected]")], | ||
), | ||
# replace <emails> with existing users' email on databricks | ||
default_task_settings=TaskSettings( | ||
email_notifications=EmailNotifications( | ||
on_start=["[email protected]"], | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.