"... This book will be a great resource for both readers looking to implement existing algorithms in a scalable fashion and readers who are developing new, custom algorithms using Spark. ..." Dr. Matei Zaharia Original Creator of Apache Spark FOREWORD by Dr. Matei Zaharia |
This directory contains all of the chapter codes for "Data Algorithms with Spark".
- Chapter 01: Introduction to Data Algorithms
- Chapter 02: Transformations in Action
- Chapter 03: Mapper Transformations
- Chapter 04: Reductions in Spark
- Chapter 05: Partitioning Data
- Chapter 06: Graph Algorithms
- Chapter 07: Interacting with External Data Sources
- Chapter 08: Ranking Algorithms
- Chapter 09: Fundamental Data Design Patterns
- Chapter 10: Common Data Design Patterns
- Chapter 11: Join Design Patterns
- Chapter 12: Feature Engineering in PySpark
The following directories are bonus chapters:
Bonus Chapter | Description |
---|---|
Word Count | Provided multiple solutions for word count problem using reduceByKey() and groupByKey() reducers. |
Anagrams | Find words, which are anagrams: provided multiple solutions for anagrams problem using reduceByKey() , groupByKey() , and combineByKey() reducers. |
Lambda Expressions | How to use Lambda Expressions in PySpark programs |
TF-IDF | Term Frequency - Inverse Document Frequency |
K-mers | K-mers for DNA Sequences |
Correlation | All vs. All Correlation |
mapPartitions() Transformation |
mapPartitions() Complete Example |
UDF | User-Defined Function Example |
DataFrames Transformations | Examples on Creation and Transformation of DataFrames |
DataFrames Tutorials | DataFrames Tutorials: from collections and CSV text files |
Join Operations | Examples on join of RDDs |
PySpark Tutorial 101 | Examples on using PySpark RDDs and DataFrames |
Physical Data Partitioning | Tutorial of Physical Data Partitioning |
Monoid: Design Principle | Monoid as a Design Principle |