Skip to content

Series of notebooks that introduce Spark for Financial & Customer Data.

Notifications You must be signed in to change notification settings

ggsmith842/Data-Analysis-Using-Spark-SQL

Repository files navigation

Data-Analysis-Using-Spark-SQL

The notebooks in this repository cover the introductory concepts of Spark. I used Databricks Community Edition as my Spark enviorment and most of the code will be Python or SQL.

Topics Covered:

  1. Intro to SparkSQL
  2. Data Preparation
  3. Executing Queries in Spark
  4. Data Schemas in Spark
  5. Window Functions in Spark
  6. User-Defined Functions (UDFs)

Data Theme

I will be using financial data for different exercises shown in this notebook as I've always enjoyed studying the financial markets and Finance is an industry that will continue to see growth in the use of Big Data frameworks like Spark.

The data sets included in this repo are:

  1. Price data from yfinance including company overviews
  2. Fictionalized client data for an Asset Managment firm

Selected Screenshots From Databricks

I've included two screenshots that show the built-in visualization capabilities of Databricks. These are created using the display command and the databricks GUI.

The first visualization shows the returns for each stock in the sample portfolio plotted for one year. Stock returns

The second visual shows how we can plot the moving averages along with the closing price for technical analysis. Technical Indicators

Notes This code uses public data and is for educational purposes only. I am not a finance expert and any opinions or investments showcased in this repository are not recomendations or reflect any financial advice. Please note some data may be fictionalized.

About

Series of notebooks that introduce Spark for Financial & Customer Data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published