Skip to content

Commit

Permalink
Adding Databricks tool demo notebooks for qualification and profiling (
Browse files Browse the repository at this point in the history
…#249)

* Adding Databricks tool demo notebooks for basic qualification and profiling

Signed-off-by: Matt Ahrens <[email protected]>

* Adding README updates for the databricks tool notebooks

Signed-off-by: Matt Ahrens <[email protected]>

* Fixing typo in OUTPUT_DIR in qual notebook

Signed-off-by: Matt Ahrens <[email protected]>

Signed-off-by: Matt Ahrens <[email protected]>
  • Loading branch information
mattahrens authored Nov 21, 2022
1 parent 41f25c7 commit 5f77070
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ There are broadly four categories of examples in this repo:
2. [Spark XGBoost](./examples/XGBoost-Examples)
3. [Deep Learning/Machine Learning](./examples/ML+DL-Examples)
4. [RAPIDS UDF](./examples/UDF-Examples)
5. [Databricks Tools demo notebooks](./tools/databricks)

For more information on each of the examples please look into respective categories.

Expand Down
12 changes: 12 additions & 0 deletions tools/databricks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Databricks Tools Demo Notebooks

The RAPIDS Accelerator for Apache Spark includes two key tools for understanding the benefits of
GPU acceleration as well as analyzing GPU Spark jobs. For customers on Databricks, the demo
notebooks offer a simple interface for running the tools given a set of Spark event logs from
CPU (qualification) or GPU (profiling) application runs.

To use a demo notebook, you can import the notebook in the Databricks Notebook UI via File->Import Notebook.

Once the demo notebook is imported, you can select run to activate the notebook to an available compute
cluster. Once the notebook is activated, you can enter in the log path location in the text widget at the
top of the notebook. After that, select *Run all* to execute the tools for the specific logs in the log path.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","source":["# Welcome to the Profiling Tool for the RAPIDS Accelerator for Apache Spark\nTo run the tool, you need to enter a log path that represents the DBFS location for your Spark GPU event logs. Then you can select \"Run all\" to execute the notebook. After the notebook completes, you will see various output tables show up below.\n\n## GPU Job Tuning Recommendations\nThis has general suggestions for tuning your applications to run optimally on GPUs.\n\n## Per-Job Profile\nThe profiler output includes information about the application, data sources, executors, SQL stages, Spark properties, and key application metrics at the job and stage levels."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"5156a76c-7af7-465d-aff4-41a2e54e3595","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["import json\nimport requests\nimport base64\nimport shlex\nimport subprocess\nimport pandas as pd\n\nTOOL_JAR_URL = 'https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.10.0/rapids-4-spark-tools_2.12-22.10.0.jar'\nTOOL_JAR_LOCAL_PATH = '/tmp/rapids-4-spark-tools.jar'\n\n# Profiling tool output directory.\nOUTPUT_DIR = '/tmp' \n\nresponse = requests.get(TOOL_JAR_URL)\nopen(TOOL_JAR_LOCAL_PATH, \"wb\").write(response.content)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"53b4d770-9db6-4bd7-9b93-d036d375eac5","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["dbutils.widgets.text(\"log_path\", \"\")"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"f0e4371a-d2d9-4449-81ed-8f6c61ae8f80","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["eventlog_string=dbutils.widgets.get(\"log_path\") \n\nq_command_string=\"java -Xmx10g -cp /tmp/rapids-4-spark-tools.jar:/databricks/jars/* com.nvidia.spark.rapids.tool.profiling.ProfileMain --csv --auto-tuner -o {} \".format(OUTPUT_DIR) + eventlog_string\nargs = shlex.split(q_command_string)\ncmd_out = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)\n\nif cmd_out.returncode != 0:\n dbutils.notebook.exit(\"Profiling Tool failed with stderr:\" + cmd_out.stderr)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"e9e7cecf-c2dc-4a0f-aea1-61a323e4ccc4","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["import os\n\napp_df = pd.DataFrame(columns = ['appId', 'appName'])\n\nfor x in os.scandir(OUTPUT_DIR + \"/rapids_4_spark_profile/\"):\n tmp_df = pd.read_csv(x.path + \"/application_information.csv\")\n app_df = app_df.append(tmp_df[['appId', 'appName']])"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"be0a2da7-1ee3-475e-96f9-303779edfd85","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## GPU Job Tuning Recommendations"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"a1e326ec-5701-4b08-ae0f-7df0c8440038","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["app_list = app_df[\"appId\"].tolist()\napp_recommendations = pd.DataFrame(columns=['app', 'recommendations'])\n\nfor app in app_list:\n app_file = open(OUTPUT_DIR + \"/rapids_4_spark_profile/\" + app + \"/profile.log\")\n recommendations_start = 0\n recommendations_str = \"\"\n for line in app_file:\n if recommendations_start == 1:\n recommendations_str = recommendations_str + line\n if \"### D. Recommended Configuration ###\" in line:\n recommendations_start = 1\n app_recommendations = app_recommendations.append({'app': app, 'recommendations': recommendations_str}, ignore_index=True)\n \ndisplay(app_recommendations)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"4979f78c-44a0-4e54-b803-e5e194b71104","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## Per-App Profile"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"1d4f9927-e9d8-4897-b604-f7832dc634aa","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["for x in os.scandir(OUTPUT_DIR + \"/rapids_4_spark_profile/\"):\n print(\"APPLICATION ID = \" + str(x))\n log = open(x.path + \"/profile.log\")\n print(log.read())"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"9a8f1a58-e86f-4bd0-a245-878186feb8b9","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"application/vnd.databricks.v1+notebook":{"notebookName":"[RAPIDS Accelerator for Apache Spark] Profiling Tool Notebook Template","dashboards":[{"elements":[{"elementNUID":"be0a2da7-1ee3-475e-96f9-303779edfd85","dashboardResultIndex":0,"guid":"05eef9d3-7c55-4e26-8d1f-fa80338359e6","resultIndex":null,"options":null,"position":{"x":0,"y":0,"height":6,"width":24,"z":null},"elementType":"command"}],"guid":"a9ea7799-040a-484e-a59d-c3cdf5072953","layoutOption":{"stack":true,"grid":true},"version":"DashboardViewV1","nuid":"91c1bfb2-695a-4e5c-8a25-848a433108dc","origId":2690941040041430,"title":"Executive View","width":1600,"globalVars":{}},{"elements":[],"guid":"0896a45f-af1b-4849-b6c2-2b6abcb8b97b","layoutOption":{"stack":true,"grid":true},"version":"DashboardViewV1","nuid":"62243296-4562-4f06-90ac-d7a609f19c16","origId":2690941040041431,"title":"App View","width":1920,"globalVars":{}}],"notebookMetadata":{"pythonIndentUnit":2,"widgetLayout":[{"name":"log_path","width":576,"breakBefore":false},{"name":"Apps","width":494,"breakBefore":false}]},"language":"python","widgets":{"log_path":{"nuid":"c7ce3870-db19-4813-b1cb-cead3f4c36f1","currentValue":"/dbfs/","widgetInfo":{"widgetType":"text","name":"log_path","defaultValue":"","label":null,"options":{"widgetType":"text","validationRegex":null}}}},"notebookOrigID":2690941040041407}},"nbformat":4,"nbformat_minor":0}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"cells":[{"cell_type":"markdown","source":["# Welcome to the Qualification Tool for the RAPIDS Accelerator for Apache Spark\nTo run the tool, you need to enter a log path that represents the DBFS location for your Spark CPU event logs. Then you can select \"Run all\" to execute the notebook. After the notebook completes, you will see various output tables show up below.\n\n## Summary Output\nThe report represents the entire app execution, including unsupported operators and non-SQL operations. By default, the applications and queries are sorted in descending order by the following fields:\n- Recommendation;\n- Estimated GPU Speed-up;\n- Estimated GPU Time Saved; and\n- End Time.\n\n## Stages Output\nFor each stage used in SQL operations, the Qualification tool generates the following information:\n1. App ID\n1. Stage ID\n1. Average Speedup Factor: the average estimated speed-up of all the operators in the given stage.\n1. Stage Task Duration: amount of time spent in tasks of SQL Dataframe operations for the given stage.\n1. Unsupported Task Duration: sum of task durations for the unsupported operators. For more details, see Supported Operators.\n1. Stage Estimated: True or False indicates if we had to estimate the stage duration.\n\n## Execs Output\nThe Qualification tool generates a report of the “Exec” in the “SparkPlan” or “Executor Nodes” along with the estimated acceleration on the GPU. Please refer to the Supported Operators guide for more details on limitations on UDFs and unsupported operators.\n1. App ID\n1. SQL ID\n1. Exec Name: example Filter, HashAggregate\n1. Expression Name\n1. Task Speedup Factor: it is simply the average acceleration of the operators based on the original CPU duration of the operator divided by the GPU duration. The tool uses historical queries and benchmarks to estimate a speed-up at an individual operator level to calculate how much a specific operator would accelerate on GPU.\n1. Exec Duration: wall-Clock time measured since the operator starts till it is completed.\n1. SQL Node Id\n1. Exec Is Supported: whether the Exec is supported by RAPIDS or not. Please refer to the Supported Operators section.\n1. Exec Stages: an array of stage IDs\n1. Exec Children\n1. Exec Children Node Ids\n1. Exec Should Remove: whether the Op is removed from the migrated plan."],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"df33c614-2ecc-47a0-8600-bc891681997f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["import json\nimport requests\nimport base64\nimport shlex\nimport subprocess\nimport pandas as pd\n\nTOOL_JAR_URL = 'https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-tools_2.12/22.10.0/rapids-4-spark-tools_2.12-22.10.0.jar'\nTOOL_JAR_LOCAL_PATH = '/tmp/rapids-4-spark-tools.jar'\n\n# Qualification tool output directory.\nOUTPUT_DIR = '/tmp/'\n\nresponse = requests.get(TOOL_JAR_URL)\nopen(TOOL_JAR_LOCAL_PATH, \"wb\").write(response.content)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"53b4d770-9db6-4bd7-9b93-d036d375eac5","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["dbutils.widgets.text(\"log_path\", \"\")\neventlog_string=dbutils.widgets.get(\"log_path\")\n\nq_command_string=\"java -Xmx10g -cp /tmp/rapids-4-spark-tools.jar:/databricks/jars/* com.nvidia.spark.rapids.tool.qualification.QualificationMain -o {} \".format(OUTPUT_DIR) + eventlog_string\nargs = shlex.split(q_command_string)\ncmd_out = subprocess.run(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)\n\n\nif cmd_out.returncode != 0:\n dbutils.notebook.exit(\"Qualification Tool failed with stderr:\" + cmd_out.stderr)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"e9e7cecf-c2dc-4a0f-aea1-61a323e4ccc4","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## Summary Output"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"bbe50fde-0bd6-4281-95fd-6a1ec6f17ab2","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["summary_output=pd.read_csv(OUTPUT_DIR + \"rapids_4_spark_qualification_output/rapids_4_spark_qualification_output.csv\")\ndisplay(summary_output)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"fb8edb26-e173-47ff-92a1-463baec7c06b","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## Stages Output"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"6756159b-30ca-407a-ab6b-9c29ced01ea6","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["stages_output=pd.read_csv(OUTPUT_DIR + \"rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_stages.csv\")\ndisplay(stages_output)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"cdde6177-db5f-434a-995b-776678a64a3a","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## Execs Output"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"4d7ce219-ae75-4a0c-a78c-4e7f25b8cd6f","inputWidgets":{},"title":""}}},{"cell_type":"code","source":["execs_output=pd.read_csv(OUTPUT_DIR + \"rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_execs.csv\")\ndisplay(execs_output)"],"metadata":{"application/vnd.databricks.v1+cell":{"showTitle":false,"cellMetadata":{},"nuid":"998b0c51-0cb6-408e-a01a-d1f5b1a61e1f","inputWidgets":{},"title":""}},"outputs":[],"execution_count":0}],"metadata":{"application/vnd.databricks.v1+notebook":{"notebookName":"[RAPIDS Accelerator for Apache Spark] Qualification Tool Notebook Template","dashboards":[{"elements":[],"guid":"0ed3c80b-b2f6-4c89-9a92-1af2f168d5ea","layoutOption":{"stack":true,"grid":true},"version":"DashboardViewV1","nuid":"91c1bfb2-695a-4e5c-8a25-848a433108dc","origId":2721260844584915,"title":"Executive View","width":1600,"globalVars":{}},{"elements":[],"guid":"ab4cecf9-0471-4fee-aa33-8927bb7e1bb1","layoutOption":{"stack":true,"grid":true},"version":"DashboardViewV1","nuid":"62243296-4562-4f06-90ac-d7a609f19c16","origId":2721260844584916,"title":"App View","width":1920,"globalVars":{}}],"notebookMetadata":{"pythonIndentUnit":2,"widgetLayout":[{"name":"log_path","width":1152,"breakBefore":false}]},"language":"python","widgets":{"log_path":{"nuid":"88986aa6-6e67-4d09-aeeb-7c96ea1ea8f1","currentValue":"/dbfs/","widgetInfo":{"widgetType":"text","name":"log_path","defaultValue":"","label":null,"options":{"widgetType":"text","validationRegex":null}}}},"notebookOrigID":2721260844584890}},"nbformat":4,"nbformat_minor":0}

0 comments on commit 5f77070

Please sign in to comment.