Skip to content

An ETL pipeline to update a DB with modifications made in a CSV file

License

Notifications You must be signed in to change notification settings

pyjavo/update_csv_pipeline

Repository files navigation

About The Project

Product Name Screen Shot Inpsired by https://github.com/bereketkibru/Data_engineering_sensor_data

Using a docker-compose file, developed a completely dockerized ELT pipeline with MySQL for data storage, Airflow for automation and orchestration, DBT for data transformation, and a Redash dashboard connected to the MySQL database.

Built With

Tech Stack used in this project

Getting Started

Prerequisites

Make sure you have docker installed on local machine.

  • Docker
  • DockerCompose

Installation

  1. Clone the repo

    git clone https://github.com/pyjavo/update_csv_pipeline
  2. Create directory /data at the root of the project.

  3. Save file archivo.csv within /data directory.

  4. Build

     docker-compose build
  5. Create DB for server service

    docker-compose run --rm server create_db
  6. Run

     docker-compose up
  7. Open Airflow web browser

    Navigate to `http://localhost:8000/` on the browser
    use `admin` for username
    use `admin` for password
  8. Access redash dashboard

    Navigate to `http://localhost:5000/` on the browser
  9. Access your MySQL database using adminer

    Navigate to `http://localhost:8080/` on the browser
    choose mysql databse
    use `root` for username
    use `root` for password
    use `mysqldb` for database

Documentation

Recommended docstring format is Google format

License

Distributed under the MIT License. See LICENSE for more information.

About

An ETL pipeline to update a DB with modifications made in a CSV file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published