Skip to content

high-performance-spark/high-performance-spark-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

da5958d · Nov 3, 2024
Nov 3, 2024
Aug 12, 2024
Jan 23, 2017
Nov 3, 2024
Sep 21, 2023
Sep 5, 2023
Aug 15, 2024
Aug 29, 2023
Aug 27, 2024
Apr 1, 2024
Nov 3, 2024
Aug 27, 2024
Mar 17, 2016
Jan 11, 2016
Apr 1, 2024
Apr 1, 2024
Aug 15, 2024
Oct 17, 2023
Dec 4, 2022
Aug 27, 2024
Aug 15, 2024
Jan 11, 2016
Apr 22, 2024
Jan 26, 2017
Nov 3, 2024
Aug 27, 2024
Apr 1, 2024
Aug 15, 2024
Aug 27, 2024
Aug 12, 2024
Jun 25, 2024
Apr 4, 2017
Oct 17, 2023
Oct 9, 2023

Repository files navigation

high-performance-spark-examples

Examples for High Performance Spark

We are in the progress of updata this for Spark 3.5+ and the 2ed edition of our book!

Building

Most of the examples can be built with sbt, the C and Fortran components depend on gcc, g77, and cmake.

Tests

The full test suite depends on having the C and Fortran components built as well as a local R installation available.

The most "accuate" way of seeing how we run the tests is to look at the .github workflows

History Server

The history server can be a great way to figure out what's going on.

By default the history server writes to /tmp/spark-events so you'll need to create that directory if not setup with

mkdir -p /tmp/spark-events

The scripts for running the examples generally run with the event log enabled.

You can set the SPARK_EVENTLOG=true before running the scala tests and you'll get the history server too!

e.g.

SPARK_EVENTLOG=true sbt test

If you want to run just a specific test you can run testOnly

Then to view the history server you'll want to launch it using the ${SPARK_HOME}/sbin/start-history-server.sh then you can go to your local history server