Skip to content

Udacity Datascience : Natural Language Processing Pipeline for Disaster responds , ELT process prepares dataset for machine language modeling pipeline. End result is a model for predicting disaster responds category from a message in real time

Notifications You must be signed in to change notification settings

austin047/udacity-datascience-disaster-res-pipeline

Repository files navigation

Table of Contents

  1. Description
  2. Dependencies
  3. Installation
  4. File Descriptions
  5. Results
  6. Licensing, Authors, and Acknowledgements

Description

This project in Collaboration with Figure Eight is part of the fullfillment of the Udacity Data Science NanoDegree. The dataset here is provided by Figure Eight and contains pre-labelled tweet and messages from real-life disaster events. This project is aim is to build a Natural Language Processing (NLP) model to categorize messages according to predefined Categories.

The Project is seperated into 3 main parts

  • ETL Pipeline for gathering the datasets and preparing them for the Maching learning Modeling step
  • ML Pipeline for building a NLP model and exporting saving into a database
  • Flask App for providing end user interactivity with the model and visualizations.

Dependencies

- Python 3.x.x+
- Machine Learning & ELT: Pandas, Numpy, Sciki-Learn
- Natural Language Processes: nltk
- Database: SQLalchemy (SQLite Database)
- Model Persistence: Pickle
- Web App and Data Visualization: Flask, Plotly

Installation

1. Clone the Repository 

    
2. Run ETL Pipeline
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasteResponse.db

3. Run ML Pipeline

```python

python models/train_classifier.py data/DisasterResponse.db data/classifier.pkl

```

4. Run Web App
 ```python
 
 cd app
 python run.py
 
 ```
 5. Access webapp on ```http://0.0.0.0:3001/``` on your browser

File Descriptions

There are 3 main parts

  • data folder: Data
    • ETL Pipeline Preparation.ipynb: ETL pipeline notebook
    • process_data.py: Contain ETL pipeline python code for preparing data for ML pipeline
  • models folder : Contains machine learning files
    • ML Pipeline Preparation.ipynb: ML pipeline notebook
    • train_classifier.py: Contain python code for running ML Pipelinne
  • app: Web App and Visualizations run.py: Main Falsk app template folder: contain templates files

Results

  1. A Model to run prediction on, after following installatin above the model is persisted to a pickle file in the data founder 'classifier.pkl'

  2. After the was ran an average f1-score .94 was obtained, 94%

  3. A web interface to test the model, inferfaces below

Fig1 - Home Screen

Home Screen

Fig2 - Enter Question

Questin Screen


Fig3 - Predicted Results

Spotify ER Diagram

Fig4 - Distributed Categories

Distibuted Categories

Licensing, Authors, Acknowledgements

Credits to Figure Eight for the data.

Author

Fuh Austin

udacity-datascience-disaster-response-pipeline

About

Udacity Datascience : Natural Language Processing Pipeline for Disaster responds , ELT process prepares dataset for machine language modeling pipeline. End result is a model for predicting disaster responds category from a message in real time

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published