Patient Data ETL Pipeline

Developed by Sushant Sinha as a part of Bombardier Assessment

This repository contains a Python ETL (Extract, Transform, Load) pipeline designed to clean and process patient data from a hospital. The pipeline removes protected health information (PHI), handles missing and invalid values, normalizes data, and stores the cleaned data into a structured format. Additionally, it includes unit tests to ensure data integrity and correctness.

Features

Removes PHI (names, addresses, etc.) from the dataset.
Handles missing values and invalid data (e.g., NaN, inf, negative values).
Normalizes and cleans the data.
Adds columns for average glucose levels and diabetes diagnosis.
Excludes outliers when calculating mean values.
Stores the cleaned data into a CSV file.
Includes comprehensive unit tests.

Requirements

Python 3.6+
pandas
numpy
unittest (for running tests)

Installation

Clone the repository:

git clone https://github.com/sushant-sinha/Bombardier-Assessment.git
cd Bombardier-Assessment

Install the required dependencies:
```
pip install pandas numpy
```

Usage

Place your input CSV file (e.g., patient_data.csv) in the project directory.

Update the file paths in diabetesDiagnosis.py:

file_path = 'path_to_your_file/patient_data.csv'
output_file_path = 'path_to_your_output/diabetes_diagnosis_data.csv'

Run the ETL script:
```
python diabetesDiagnosis.py
```
The processed data will be saved to the specified output file path.

Testing

To run the unit tests, use the following command:
```
python testDiabetesDiagnosis.py
```
The tests will verify the correctness of the ETL functions and ensure data integrity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Patient Data ETL Pipeline

Table of Contents

Features

Requirements

Installation

Usage

Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Patient Data ETL Pipeline

Table of Contents

Features

Requirements

Installation

Usage

Testing