Skip to content

Files

Latest commit

e3d16dd · Dec 24, 2024

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Dec 24, 2024
Dec 24, 2024

TPC-DS Scale Factor 10 (GiB) - CPU Spark vs GPU Spark

TPC-DS is a decision support benchmark often used to evaluate performance of OLAP Databases and Big Data systems.

The notebook in this folder runs a user-specified subset of the TPC-DS queries on the Scale Factor 10 (GiB) dataset. It uses TPCDS PySpark to execute TPC-DS queries with SparkSQL on GPU and CPU capturing the metrics as a Pandas dataframe. It then plots a comparison bar chart visualizing the GPU acceleration achieved for the queries run with RAPIDS Spark in this very notebook.

This notebook can be opened and executed using standard

  • Jupyter(Lab)
  • in VSCode with Jupyter extension

It can also be opened and evaluated on hosted Notebook environments. Use the link below to launch on Google Colab and connect it to a GPU instance.

Open In Colab

Here is the bar chart from a recent execution on Google Colab's T4 High RAM instance using RAPIDS Spark 24.12.0 with Apache Spark 3.5.0

tpcds-speedup