Multi-Task House Prediction

Project Objective

The goal of this project is to build a multi-task learning model using PyTorch Lightning that predicts both house prices (a regression task) and house categories (a classification task). This involves using a shared bottom neural network architecture with task-specific top layers, applying advanced machine learning techniques and model management features provided by PyTorch Lightning.

Dataset

The project utilizes the "House Prices - Advanced Regression Techniques" dataset from kaggle, which can be downloaded here https://www.kaggle.com/c/house-prices-advanced-regression-techniques. A new variable 'House Category' is created from features such as 'House Style', 'Bldg Type', 'Year Built', and 'Year Remod/Add', to classify houses into distinct categories alongside predicting their prices.

Installation

Prerequisites

Python 3.8 or higher
pip for managing Python packages

Steps

Clone the repository:

git clone https://github.com/yourusername/multi-task-house-prediction.git

Navigate to the project directory:
```
cd multi-task-house-prediction
```
Install required Python packages:
```
pip install -r requirements.txt
```

Usage

Data Preparation

Run the data preprocessing script to prepare your data:

python preprocessing/data_preprocessor.py

Training the Model

To train the model, execute:

python training/train_model.py

Evaluating the Model

To evaluate the model on the test dataset, use:

python evaluation/evaluate_model.py

Project Structure

data/: Contains raw and processed data.
models/: Includes the multi-task learning model definition and .pkl file.
preprocessing/: Scripts for data cleaning and preparation.
training/: Training and validation logic.
evaluation/: Model evaluation scripts.
utils/: Utilities like logging and other common tasks.
notebooks/: Jupyter notebooks for analysis and experimentation.
reports/: Data profile and project reports.
lightning_logs/: Contains MLFlow runs and model checkpoints.
requirements.txt: Project dependencies.
README.md: Documentation of the project.

Advanced Features

This project makes extensive use of PyTorch Lightning's advanced features such as:

Logging: Integration with MLFlow for monitoring training progress and performance.
Callbacks: Use of the ModelCheckpoint callback to save the best model during training.
Trainer API: Utilizes the Trainer class from PyTorch Lightning for efficient model training and testing.

Hyperparameter Tuning

Hyperparameter optimization is performed using PyTorch Lightning's integration with Optuna:

python training/tune_hyperparameters.py

Results

After training and evaluating our multi-task learning model, we achieved promising results on our test dataset. The model's performance is summarized as follows:

Accuracy: The model achieved an accuracy of approximately 83.90% in categorizing houses into one of the 8 categories. This high level of accuracy indicates that the model is quite effective in understanding and classifying the complex patterns associated with the diverse house categories.
RMSE: The Root Mean Square Error (RMSE) for the house price prediction task was around 60,196.13. While there's room for improvement, this value suggests that the model has a decent grasp on predicting house prices, considering the variability and range of house prices in the dataset.

These results demonstrate the capability of our multi-task learning approach in handling the intricacies of predicting multiple outputs simultaneously. Future work will focus on further optimizing the model to improve both accuracy and RMSE, potentially through more advanced feature engineering, model architecture adjustments, and hyperparameter tuning.

View Logged Runs in the MLFlow UI

From the lightning_logs/ directory:

python -m mlflow ui --port 8080 --backend-store-uri sqlite:///mlruns.db

Navigate to http://localhost:8080 in your browser to view the results.

You can also click and run the start_mlflow.bat script on Windows to not only start the MLflow UI but also automatically open the default web browser.

Authors

Ethan Pirso

Acknowledgments

PyTorch Lightning team for providing an excellent framework for rapid prototyping of deep learning models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Task House Prediction

Project Objective

Dataset

Installation

Prerequisites

Steps

Usage

Data Preparation

Training the Model

Evaluating the Model

Project Structure

Advanced Features

Hyperparameter Tuning

Results

View Logged Runs in the MLFlow UI

Authors

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
evaluation		evaluation
lightning_logs		lightning_logs
models		models
notebooks		notebooks
preprocessing		preprocessing
reports		reports
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ethanpirso/Multi-Task-House-Prediction

Folders and files

Latest commit

History

Repository files navigation

Multi-Task House Prediction

Project Objective

Dataset

Installation

Prerequisites

Steps

Usage

Data Preparation

Training the Model

Evaluating the Model

Project Structure

Advanced Features

Hyperparameter Tuning

Results

View Logged Runs in the MLFlow UI

Authors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages