The notebooks in this repository cover the introductory concepts of Spark. I used Databricks Community Edition as my Spark enviorment and most of the code will be Python or SQL.
- Intro to SparkSQL
- Data Preparation
- Executing Queries in Spark
- Data Schemas in Spark
- Window Functions in Spark
- User-Defined Functions (UDFs)
I will be using financial data for different exercises shown in this notebook as I've always enjoyed studying the financial markets and Finance is an industry that will continue to see growth in the use of Big Data frameworks like Spark.
The data sets included in this repo are:
- Price data from yfinance including company overviews
- Fictionalized client data for an Asset Managment firm
I've included two screenshots that show the built-in visualization capabilities of Databricks. These are created using the display
command and the databricks GUI.
The first visualization shows the returns for each stock in the sample portfolio plotted for one year.
The second visual shows how we can plot the moving averages along with the closing price for technical analysis.
Notes This code uses public data and is for educational purposes only. I am not a finance expert and any opinions or investments showcased in this repository are not recomendations or reflect any financial advice. Please note some data may be fictionalized.