As a personal project, I'm taking all the sample Python code in Apache Spark In 24 Hours, and converting it to Scala. The code can be used in the Try It Yourself sections. This code runs on Apache Spark 2.2.0.
In the book, the code uses RDDs. However, with structured data, you're supposed to use Datasets and Dataframes, so I wrote it using suggested best practice.