Skip to content

wasit7/mastering_data_preparation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mastering Data Preparation

Here is a sample README.md file that you can use for your lecture note repository:


Lecture Notes Repository for Data Science Course

Welcome to the Lecture Notes Repository! This repository contains a collection of Jupyter Notebooks and supplementary materials for a data science course that covers key concepts such as data cleaning, validation, real-time data processing, Change Data Capture (CDC), monadic transformations, and more.

Table of Contents

  1. Introduction
  2. Lecture Notebooks Overview
    • Week 1: Introduction to Data Science
    • Week 2: Data Cleaning with Pandas
    • Week 3: Real-Time Data Processing
    • Week 4: Change Data Capture (CDC)
    • Week 5: Data Validation and Quality Assurance
    • Week 6: Pathumthani Platform Data Integration
    • Week 7: Conclusion and Future Work
    • Week 8: Monadsquishy Data Transformation Tutorial
  3. Installation
  4. Usage
  5. Contributing
  6. License

1. Introduction

This repository provides detailed lecture notes, example code, and practical exercises designed to help students master key data science techniques. Each week builds upon the previous, offering practical insights into data processing and transformations in Python using libraries such as pandas, as well as introducing advanced topics like monadic transformations through the Monadsquishy tool.


2. Lecture Notebooks Overview

Week 1: Introduction to Data Science

This notebook covers the basics of data science, the data lifecycle, and introduces common tools used in the field.

Week 2: Data Cleaning with Pandas

Learn how to clean and preprocess datasets using pandas. This includes handling missing values, standardizing formats, and eliminating duplicates.

Week 3: Real-Time Data Processing

This notebook focuses on real-time data streaming and chunk processing with pandas, covering techniques to process large datasets efficiently.

Week 4: Change Data Capture (CDC)

Explore how to implement Change Data Capture to track and process only the changed data in a dataset, improving performance and reducing redundancy.

Week 5: Data Validation and Quality Assurance

Understand how to apply validation rules to datasets, ensuring data quality through automated checks and corrections.

Week 6: Pathumthani Platform Data Integration

A practical final project that demonstrates how to integrate tourism, agriculture, and sports data into a functional platform for analysis and insights.

Week 7: Conclusion and Future Work

Summarizes the lessons learned from the course and suggests potential areas for expanding the platform with advanced analytics, new data sources, and user-friendly interfaces.

Week 8: Monadsquishy Data Transformation Tutorial

This notebook explains the Monadsquishy tool, which applies monadic transformations to simplify data pipelines. Learn how to chain transformations, handle errors, and manage missing data using functional programming principles.


3. Installation

To run the notebooks locally, you need to have the following dependencies installed:

  • Python 3.x
  • Jupyter Notebook
  • Pandas
  • Other required libraries (specified in individual notebooks)

Step-by-step Instructions:

  1. Clone the repository:

    git clone https://github.com/your-username/your-repository.git
  2. Navigate to the project directory:

    cd your-repository
  3. Install the required libraries:

    pip install -r requirements.txt
  4. Launch Jupyter Notebook:

    jupyter notebook

4. Usage

To start learning, navigate to the desired notebook and run the cells interactively. Each notebook is self-contained and includes explanations, code examples, and practice exercises.

Example Usage:

  • Open the 02_data_cleaning.ipynb notebook.
  • Follow along with the data cleaning steps, and try out different transformations on your dataset.

5. Contributing

Contributions are welcome! If you'd like to contribute to this repository:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature-branch
  3. Commit your changes:
    git commit -m "Added new feature"
  4. Push to the branch:
    git push origin feature-branch
  5. Create a Pull Request.

6. License

This project is licensed under the MIT License. See the LICENSE file for more details.


Feel free to adapt the README.md to include more specific details for your repository!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published