Skip to content

AaryanSahu/Principles-of-Programming-Languages-Project-Group-9

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 

Repository files navigation

Principles-of-Programming-Languages Project Group-9

Bayesian Housing Price Prediction using Pyro: A Sophisticated Approach to Robust Predictive Modeling

Problem Statement

Original Statement: The goal is to predict house prices using a Bayesian linear regression model implemented in Pyro.

POPL Angle: The original problem involves probabilistic modeling and inference using Pyro, a probabilistic programming library. It's a Bayesian approach to regression, providing uncertainty estimates along with predictions. The problem is novel in its use of Pyro and Bayesian methods for house price prediction, differentiating it from traditional linear regression.

Software Architecture

  • The architecture involves data preprocessing using pandas, PyTorch for tensor operations, Pyro for probabilistic modeling, and matplotlib/seaborn for result visualization.
  • No explicit client-server architecture; it's a standalone script.
  • Testing is conducted locally, assessing model performance using R-squared and visualization.
  • No database is involved; the dataset is fetched using scikit-learn.(We have also provided the same dataset as a csv file in Dataset).

POPL Aspects

Overview

  1. Limitations of Simple Linear Regression:

    • Simple linear regression provides only point estimates for coefficients.
    • Bayesian regression is crucial for generating coefficient distributions and calculating uncertainty.
  2. Ease of Implementation with Pyro:

    • Implementing Bayesian Regression in Python can be challenging.
    • Pyro, with its inbuilt functionalities, streamlines the process, making it more accessible.
  3. Utilizing Pyro for Bayesian Models:

    • Pyro is equipped with features that simplify the implementation of Bayesian models.
    • We chose Pyro as our framework for mapping and implementing Bayesian models.
  4. Flexibility in Sampling Algorithms:

    • Pyro facilitates the implementation of complex sampling algorithms like MCMC (Markov Chain Monte Carlo) and NUTS (No U-turn Sampling).
    • This allows for more robust and accurate probabilistic modeling.
  5. Addressing Dataset Assumptions:

    • Linear regression assumes a normally distributed dataset.
    • Pyro allows us to assume priors of distributions of our choice, providing flexibility for different datasets.
  6. Incorporating Prior Knowledge:

    • Probabilistic programming with Pyro allows us to include more prior knowledge about our problem.
    • This flexibility surpasses traditional linear regression, contributing to more informed predictions.

Results and Tests

  • Result R^2 : At same R^2 value (0.49) we were able to implement all three models - Bayesian Regression (with Gamma and Normal both) and linear regression and we were able to generate relevant distributions through bayesian regression in pyro.
  • Dataset: California housing dataset is used, split into training and testing sets.
  • Benchmark: R-squared is calculated to assess model performance. Visualizations include histograms of posterior distributions and scatter plots comparing predicted and true house prices.
  • Validation: The comparison with traditional linear regression acts as a validation point, demonstrating the benefits of the Bayesian approach in capturing uncertainty.

Potential for Future Work

  • Hyperparameter Tuning: Explore sensitivity to priors and hyperparameters for better model performance.
  • Feature Engineering: Experiment with additional features or transformations to improve predictive accuracy.
  • Ensemble Methods: Investigate ensemble methods or model averaging to enhance robustness.
  • Online Learning: Explore possibilities for online learning and continuous model improvement.
  • Integration with External Data: Incorporate external data sources for richer feature sets.
  • Deployment: Consider deployment strategies for the model, possibly as a web service or API (Application programming interface).
  • Explanability: Integrate tools or techniques for explaining model predictions to end-users.

Group-Members

  • Aryan Sahu : 2021A7PS2832G

  • Anuj Nethwewala: 2021A7PS2716G

  • Imaad Momin: 2021A7PS2066G

  • Subhradip Maity: 2021A7PS2983G

    File Organization:

  • Dataset

  • code-external

  • code-orig

  • result

How to Run

  1. Open the Python Notebook:

    • Open the provided Python Notebook on any Python environment, ideally Google Colab.
    • For the Bayesian Regression model,Run the Bayesian.
    • For the Gamma Model,Run the Gamma.
  2. Load the Dataset:

    • Download the dataset from Dataset.
    • Load the dataset into the notebook for testing.
  3. Test the Code and Generate Graphs:

    • Run the code cells in the notebook to execute the provided code.
    • Explore the generated graphs and results result.

Please Note: Make sure to install any required dependencies mentioned in the notebook before running the code.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •