Skip to content

Commit

Permalink
Merge branch 'branch-24.08' into hashjoin-2408
Browse files Browse the repository at this point in the history
  • Loading branch information
nvliyuan authored Jul 25, 2024
2 parents aa90f82 + 00d6ec2 commit 09b69a0
Show file tree
Hide file tree
Showing 4 changed files with 111 additions and 140 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-24.06
- branch-24.08
types: [closed]

jobs:
Expand All @@ -29,14 +29,14 @@ jobs:
steps:
- uses: actions/checkout@v4
with:
ref: branch-24.06 # force to fetch from latest upstream instead of PR ref
ref: branch-24.08 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids-examples
HEAD: branch-24.06
BASE: branch-24.08
HEAD: branch-24.08
BASE: branch-24.10
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR

11 changes: 7 additions & 4 deletions tools/databricks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,11 @@ top of the notebook. After that, select *Run all* to execute the tools for the

## Limitations
1. Currently local, S3 or DBFS event log paths are supported.
2. S3 path is only supported on Databricks AWS using [instance profiles](https://docs.databricks.com/en/connect/storage/tutorial-s3-instance-profile.html).
3. DBFS path must use the File API Format. Example: `/dbfs/<path-to-event-log>`.
4. Multiple event logs must be comma-separated.
1. S3 path is only supported on Databricks AWS using [instance profiles](https://docs.databricks.com/en/connect/storage/tutorial-s3-instance-profile.html).
1. Eventlog path must follow the formats `/dbfs/path/to/eventlog` or `dbfs:/path/to/eventlog` for logs stored in DBFS.
1. Use wildcards for nested lookup of eventlogs.
- For example: `/dbfs/path/to/clusterlogs/*/*`
1. Multiple event logs must be comma-separated.
- For example: `/dbfs/path/to/eventlog1,/dbfs/path/to/eventlog2`

**Latest Tools Version Supported** 24.06.0
**Latest Tools Version Supported** 24.06.1
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"cellMetadata": {},
"inputWidgets": {},
"nuid": "df33c614-2ecc-47a0-8600-bc891681997f",
"showTitle": false,
Expand All @@ -22,8 +19,11 @@
"### Note\n",
"- Currently, local, S3 or DBFS event log paths are supported.\n",
"- S3 path is only supported on Databricks AWS using [instance profiles](https://docs.databricks.com/en/connect/storage/tutorial-s3-instance-profile.html).\n",
"- DBFS path must use the File API format. Example: `/dbfs/<path-to-event-log>`.\n",
"- Multiple event logs must be comma-separated.\n",
"- Eventlog path must follow the formats `/dbfs/path/to/eventlog` or `dbfs:/path/to/eventlog` for logs stored in DBFS.\n",
"- Use wildcards for nested lookup of eventlogs. \n",
" - For example: `/dbfs/path/to/clusterlogs/*/*`\n",
"- Multiple event logs must be comma-separated. \n",
" - For example: `/dbfs/path/to/eventlog1,/dbfs/path/to/eventlog2`\n",
"\n",
"### Per-Job Profile\n",
"\n",
Expand All @@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -50,13 +50,13 @@
},
"outputs": [],
"source": [
"TOOLS_VER = \"24.06.0\"\n",
"TOOLS_VER = \"24.06.1\"\n",
"print(f\"Using Tools Version: {TOOLS_VER}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -76,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -97,12 +97,23 @@
"import os\n",
"import pandas as pd\n",
"\n",
"# Initialize variables from widgets\n",
"dbutils.widgets.dropdown(\"Cloud Provider\", \"aws\", [\"aws\", \"azure\"])\n",
"CSP=dbutils.widgets.get(\"Cloud Provider\")\n",
"\n",
"def convert_dbfs_path(path):\n",
" return path.replace(\"dbfs:/\", \"/dbfs/\")\n",
" \n",
"# Detect cloud provider from cluster usage tags\n",
"valid_csps = [\"aws\", \"azure\"]\n",
"CSP=spark.conf.get(\"spark.databricks.clusterUsageTags.cloudProvider\", \"\").lower()\n",
"if CSP not in valid_csps:\n",
" print(f\"ERROR: Cannot detect cloud provider from cluster usage tags. Using '{valid_csps[0]}' as default. \")\n",
" CSP = valid_csps[0]\n",
"else:\n",
" print(f\"Detected Cloud Provider from Spark Configs: '{CSP}'\")\n",
"\n",
"# Initialize variables from widgets\n",
"dbutils.widgets.text(\"Eventlog Path\", \"/dbfs/user1/profiling_logs\")\n",
"EVENTLOG_PATH=dbutils.widgets.get(\"Eventlog Path\")\n",
"EVENTLOG_PATH=convert_dbfs_path(EVENTLOG_PATH)\n",
"\n",
"dbutils.widgets.text(\"Output Path\", \"/tmp\")\n",
"OUTPUT_PATH=dbutils.widgets.get(\"Output Path\")\n",
Expand All @@ -122,7 +133,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -138,17 +149,14 @@
"outputs": [],
"source": [
"%sh\n",
"spark_rapids profiling --platform databricks-$CSP --eventlogs $EVENTLOG_PATH -o $OUTPUT_PATH > $CONSOLE_OUTPUT_PATH 2> $CONSOLE_ERROR_PATH"
"spark_rapids profiling --platform databricks-$CSP --eventlogs \"$EVENTLOG_PATH\" -o \"$OUTPUT_PATH\" --verbose > \"$CONSOLE_OUTPUT_PATH\" 2> \"$CONSOLE_ERROR_PATH\""
]
},
{
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"cellMetadata": {},
"inputWidgets": {},
"nuid": "f83af6c8-5a79-4a46-965b-38a4cb621877",
"showTitle": false,
Expand All @@ -157,14 +165,12 @@
},
"source": [
"## Console Output\n",
"Console output shows the recommended configurations for each app\n",
"\n",
"**Note**: Use the `--verbose` flag in the command above for more detailed output.\n"
"Console output shows the recommended configurations for each app\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -188,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand All @@ -198,7 +204,7 @@
"inputWidgets": {},
"nuid": "f3c68b28-fc62-40ae-8528-799f3fc7507e",
"showTitle": true,
"title": "Show Error Log"
"title": "Show Logs"
},
"jupyter": {
"source_hidden": true
Expand All @@ -212,7 +218,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand Down Expand Up @@ -282,7 +288,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand Down Expand Up @@ -374,10 +380,7 @@
"cell_type": "markdown",
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"cellMetadata": {},
"inputWidgets": {},
"nuid": "bbe50fde-0bd6-4281-95fd-6a1ec6f17ab2",
"showTitle": false,
Expand All @@ -394,7 +397,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
Expand Down Expand Up @@ -452,7 +455,7 @@
"stack": true
},
"nuid": "91c1bfb2-695a-4e5c-8a25-848a433108dc",
"origId": 825198511668411,
"origId": 1075819839476955,
"title": "Executive View",
"version": "DashboardViewV1",
"width": 1600
Expand All @@ -466,7 +469,7 @@
"stack": true
},
"nuid": "62243296-4562-4f06-90ac-d7a609f19c16",
"origId": 825198511668412,
"origId": 1075819839476956,
"title": "App View",
"version": "DashboardViewV1",
"width": 1920
Expand All @@ -476,81 +479,57 @@
"language": "python",
"notebookMetadata": {
"mostRecentlyExecutedCommandWithImplicitDF": {
"commandId": 825198511668406,
"commandId": 203373918309288,
"dataframes": [
"_sqldf"
]
},
"pythonIndentUnit": 2,
"widgetLayout": [
{
"breakBefore": false,
"name": "Cloud Provider",
"width": 183
},
{
"breakBefore": false,
"name": "Eventlog Path",
"width": 728
"width": 778
},
{
"breakBefore": false,
"name": "Output Path",
"width": 232
"width": 302
}
]
},
"notebookName": "[RAPIDS Accelerator for Apache Spark] Profiling Tool Notebook Template",
"widgets": {
"Cloud Provider": {
"currentValue": "aws",
"nuid": "8dddcaf7-104e-4247-b811-ff7a133b28d4",
"typedWidgetInfo": null,
"widgetInfo": {
"defaultValue": "aws",
"label": null,
"name": "Cloud Provider",
"options": {
"autoCreated": null,
"choices": [
"aws",
"azure"
],
"widgetType": "dropdown"
},
"widgetType": "dropdown"
}
},
"Eventlog Path": {
"currentValue": "/dbfs/user1/profiling_logs",
"nuid": "1272501d-5ad9-42be-ab62-35768b2fc384",
"typedWidgetInfo": null,
"widgetInfo": {
"widgetType": "text",
"defaultValue": "/dbfs/user1/profiling_logs",
"label": "",
"label": null,
"name": "Eventlog Path",
"options": {
"autoCreated": false,
"validationRegex": null,
"widgetType": "text"
},
"widgetType": "text"
"widgetType": "text",
"autoCreated": null,
"validationRegex": null
}
}
},
"Output Path": {
"currentValue": "/tmp",
"nuid": "ab7e082c-1ef9-4912-8fd7-51bf985eb9c1",
"typedWidgetInfo": null,
"widgetInfo": {
"widgetType": "text",
"defaultValue": "/tmp",
"label": null,
"name": "Output Path",
"options": {
"widgetType": "text",
"autoCreated": null,
"validationRegex": null,
"widgetType": "text"
},
"widgetType": "text"
"validationRegex": null
}
}
}
}
Expand Down
Loading

0 comments on commit 09b69a0

Please sign in to comment.