Skip to content

ETL workflow and data analysis. ETL-workflow using prefect and pygrametl (SCD, slow changing dimension). Product classification based on product name.

Notifications You must be signed in to change notification settings

rajrohan/ETL-workflow

Repository files navigation

ETL-workflow

ETL workflow and data analysis. ETL-workflow using prefect and pygrametl (SCD, slow changing dimension). Product classification based on product name.

How to guide:

Please run the sql script to create db or please create manually database with name :product_sales.

CREATE DATABASE IF NOT EXISTS product_sales ;

I had security inplace with user:root and password root123. Please update the code accordingly.

All the tables will be created from the input CSV file. CSV file can found in the data repository.

Run this Program

  1. ETL-workflow.py:

    Dependency: Input filename:./data/Clothing_Sales_Data_Unique_category.csv Sql Database should exist.

Output: 4 distinct csv files for staging purpose of star schema and 1 csv for transformed file which will be created after all the transformation file can found in . ./data/stage/*

  1. Slowly-changing-dimension.py: Dependency: sql connection if the Databse and table is there then you can run this script. output update one column.

  2. Knn.py Dependency: dataframe-utils.py vectorize.py Output will be one csv file with result comparison. ./data/results/TFIDF_distinctProduct_result.csv

Refrences:

1. Dataset Related information https://demos.componentone.com/aspnet/adventureworks/Products.aspx

2. Prefect Library https://docs.prefect.io/core/tutorials/etl.html

3. Pygrametl  https://chrthomsen.github.io/pygrametl/doc/examples/dimensions.html

4. Classification https://github.com/gallib2/product-categorization

About

ETL workflow and data analysis. ETL-workflow using prefect and pygrametl (SCD, slow changing dimension). Product classification based on product name.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published