Invoice payement time prediction model .
Solution to cassandra 22 , data science event
- Team Name - PAV-BHU-JEE (P for Parthasarthii , A for Ayush , V for Vishal )
- Team Members - Ayush Agarwal (me) , Vishal Gosain , Parthasarthii Agarwal
- Event Name - CASSANDRA
- Fest Name - UDYAM 22
- Position - 3rd
- Skills - Data Science
- Tools - Standard Data Science and Machine Learning Tools
- Slides [IMPORTANT] - https://github.com/ayush-agarwal-0502/Cassandra22-Data-Science/blob/main/CASSANDRA_PAVBHUJEE'%2022.pdf
Our team secured 3rd position in "CASSANDRA" in 2022 , the Data Science event held under "Udyam" , the Electronics Department fest of IIT BHU . This repository contains our work for the event .
(Link to leaderboard - https://www.udyamfest.com/leaderboard (link may not work after 2022) )
Link to the code - https://github.com/ayush-agarwal-0502/Cassandra22-Data-Science/blob/main/Cassandra_PAV_BHU_JEE.ipynb (uploaded to this repository )
Link to the competition - https://www.kaggle.com/competitions/cassandra-udyam-2022/overview
Link to the dataset - https://www.kaggle.com/competitions/cassandra-udyam-2022/data (I've also added a copy of the dataset to this repository in case this dosen't work )
Link to Final presentation slides on canva - https://www.canva.com/design/DAE9kYtOh4I/ewkPV5L1gdrpSoMIfRGwFA/view#4 ( Can refer to the slides in this repository too if link dosent work )
For PS , refer to the Kaggle page whose link is given above . Roughly explaining , we were required to predict when an invoice would be paid back , in number of days , based on the data given to us .
For the explanation of out solution , best refer the slides , since it explains everything about how we solved , I'll add a few important and noteworthy slides in this readme too .
The difference feature :
Fraud detection in invoices using this dataset :
Mutual Information Scores (and Pandas Profiling report correlation) :
Usage of K-Mean Clustering in predictions :