This project involves extracting Fintech data from MySQL Database, loading it into Amazon Redshift, and preparing for further analysis and visualization in Power BI. The data is processed using a Python script that uploads the data to an S3 bucket and then loads it into a Redshift database.
unicorn_data_loading_redshift.py
: This script handles the connection to AWS services (S3 and Redshift), creates necessary database schema and tables, and performs data loading operations..env
: A dotenv file to store sensitive credentials like AWS access keys, Redshift database credentials, etc. (Note: This file should not be checked into version control).README.md
: Provides project documentation.
- AWS CLI
- Boto3
- IAM
- VPC
- Amazon Redshift Cluster
- Amazon S3 Bucket
- Lambda
- Power BI for visualization (Upcoming)
- Clone the repository to your local machine.
- Ensure Python 3.x is installed.
- Install required Python packages:
pip install pandas boto3 psycopg2-binary python-dotenv
The data upload and initial processing are functioning correctly. However, there are still tasks under development:
- Data Analysis: Detailed analysis of the data is in the planning stages.
- Lambda function for Increamental Load
- Extracting data from different sources like : PostgreSQL
- Automation: For regular and scheduled transformations execute SQL scripts.
- Regular Backups: Configure and ensure regular backups of Redshift cluster to safeguard against data loss.
- Dashboard Development: A Power BI dashboard is currently under development to visualize and interact with the dataset.