Skip to content

This project provides an automated data pipeline for extracting, transforming, and analysing health data from a DHIS2 instance. It includes querying various health indicators at the facility level, enriching data with organisational details (districts and provinces), exporting results to structured CSV and loading to PostgreSQL.

License

Notifications You must be signed in to change notification settings

malambomutila/DHIS2-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DHIS2 Analytics Pipeline

Overview

The DHIS2 Analytics Pipeline automates the extraction, transformation, and analysis of health data from a DHIS2 instance. This project focuses on retrieving facility-level weekly data, enriching it with organisational unit details, and exporting the processed data in a structured format for further reporting and visualisation.

Features

  • Automated Data Extraction: Queries DHIS2 API to fetch datasets, data elements, and organisational units.
  • Facility-Level Aggregation: Aggregates health data weekly, categorised by facilities, districts, and provinces.
  • Enrichment with Organizational Details: Merges facility data with district and province information for comprehensive insights.
  • CSV Export: Outputs processed data to CSV files for easy access and analysis.

Installation

  1. Clone the repository:
    git clone https://github.com/malambomutila/DHIS2-Pipeline.git
  2. Navigate to the project directory:
    cd DHIS2-Pipeline
  3. Install required dependencies:
    pip install -r requirements.txt
  4. Configure DHIS2 credentials in the 00_Local/01_Configs/credentials.txt file:
    DHIS2_URL=your_dhis2_instance_url
    USERNAME=your_username
    PASSWORD=your_password
    

Usage

  1. Run the Jupyter Notebook to fetch and process data:
    querydata.ipynb
  2. Processed data will be saved in the 00_Local/02_Data/ directory as CSV files.

Output Files

The following CSV files are generated by the pipeline:

  • datasets.csv: List of available datasets from DHIS2.
  • data_elements.csv: Metadata for available data elements.
  • organisation_units.csv: Details of all organization units.
  • districts.csv: District-level organization units.
  • provinces.csv: Province-level organization units.
  • facility_january_2024_with_org_details.csv: Processed health data by facility.

Contributions

Contributions are welcome! Feel free to submit issues, feature requests, or pull requests to improve the project.

About

This project provides an automated data pipeline for extracting, transforming, and analysing health data from a DHIS2 instance. It includes querying various health indicators at the facility level, enriching data with organisational details (districts and provinces), exporting results to structured CSV and loading to PostgreSQL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published