Skip to content

Latest commit

 

History

History
109 lines (71 loc) · 3.58 KB

README.md

File metadata and controls

109 lines (71 loc) · 3.58 KB

XLSX Assembler – ETL Tool for Merging Excel Data

Github_Showcase

Demo (1)

Architecture

Demo (2)

forthebadge   forthebadge   forthebadge   GitHub Repo stars   GitHub forks

Built With

This project was built using these technologies.

  • Python
  • Airflow
  • Cron
  • Redis
  • Pandas
  • Openpyxl
  • PyQT5
  • Docker

Features

🚀 Efficient ETL Process

Automates the extraction, transformation, and loading (ETL) of data from multiple Excel files using Airflow.
(Only specific excel structure)

📊 Advanced Data Processing

Leverages the power of Pandas and Openpyxl for fast and accurate data reading, processing, and styling.

💻 Intuitive GUI with PyQt5

Includes a user-friendly graphical interface for selecting files and tracking real-time progress.

⚡ Performance Optimization

Optimized for reduced system load and faster data processing using Redis, ensuring efficient handling of large datasets.

Getting Started

Prerequisites:

  • Python and Docker installed on your machine

🛠 Installation and Setup Instructions

  1. Clone the repository: git clone https://github.com/NickLitwinow/XLSXAssembler_Public.git

  2. Navigate into the src directory cd src/

  3. (Terminal 1) Run the ETL client: python app.py

  4. (Terminal 2) Build the Docker image (sudo may require): docker build . --tag extending_airflow:latest

  5. (Terminal 2) Run docker-compose up -d command to start docker services.

  6. (Terminal 2) (Optional) Run docker-compose down -v command to end docker services.

The PyQt5 GUI will launch, where you can select multiple Excel files and begin the ETL process. Runs the app in the development mode.

Usage Instructions Example

  1. In the ETL client click Add File button and select files from the example files (You can add them again later if you want so)

  2. (Optional) To remove a file from selected, click on it's path (element) in the black selection window. Click Remove File to remove the file.

  3. Click Merge Files to name the output file and choose it's destination. The ETL process will start afterwards.

  4. To view the Airflow Dag process:

  • Open http://localhost:8080/home in your browser.
  • Enter Login: airflow and Password: airflow.
  • (Info) If you just ran the docker-compose up -d it may take some time for airflow to load.
  1. To view the Radis database:
  • Open http://localhost:8001/ in your browser.
  • Accept "EULA and Privacy Settings"
  • Click I already have a database
  • Click Connect to a Radis Database with Host: redis, Port: 6379, Name: redis-local
  • Click ADD REDIS DATABASE
  • Select the redis-local database.

Show your support

Give a ⭐ if you like this project!