🎥 Introduction to Batch Processing
Follow these intructions to install Spark:
And follow this to run PySpark in Jupyter
🎥 (Optional) Preparing Yellow and Green Taxi Data
Script to prepare the Dataset download_data.sh
Note: The other way to infer the schema (apart from pandas) for the csv files, is to set the inferSchema
option to true
while reading the files in Spark.
Coming soon
See here for more details
Did you take notes? You can share them here.
- Notes by Alvaro Navas
- Sandy's DE Learning Blog
- Add your notes here (above this line)