Course Project for CSE-343 (Machine Learning) - Monsoon 2023
In the face of growing concerns over mental health and the alarming rise in suicide rates, our project aims to detect and address suicide ideation by analyzing social media conversations. Utilizing advanced machine learning techniques, we've developed a robust model capable of identifying individuals at heightened risk based on their online activities. Our solution includes a real-world application through a Reddit Bot, designed to flag posts with potential suicide ideation risks.
- Medha Hira - [email protected]
- Arnav Goel - [email protected]
- Siddharth Rajput - [email protected]
The project addresses the critical need for effective suicide prevention strategies by leveraging social media as a platform for early detection of suicide ideation. With a 36% increase in suicide rates from 2000 to 2021, our predictive model seeks to provide timely intervention, potentially saving lives by identifying at-risk individuals through their digital footprints. Recognizing the pivotal role social media plays in modern communication, our system is designed to detect suicide ideation through analysis of Reddit posts. Our approach utilizes a comprehensive dataset from the r/SuicideWatch subreddit, applying machine learning algorithms to identify early signs of suicidal thoughts.
We employed the University of Maryland Reddit Suicidality Dataset, conducting rigorous data preprocessing to clean and prepare text data for analysis. Techniques included removal of non-ASCII characters, URLs, usernames, and punctuation, as well as stopwords and lowercasing for standardization.
Our methodology encompasses a diverse range of machine learning models, including Logistic Regression, SVM, Naive Bayes, Decision Trees, and Random Forest, among others. We also explored ensemble methods and neural networks for enhanced predictive performance. Evaluation metrics such as accuracy, precision, and recall were employed to assess model effectiveness.
Our findings indicate that models like LDA, Logistic Regression, and the SVM classifier perform best, with notable improvements using Word2Vec embeddings. Ensemble methods and a Multilayer Perceptron (MLP) classifier also showed promising results, demonstrating the efficacy of our approach in detecting suicide ideation with high accuracy.
Results for Machine Learning Models:
Results for Ensemble Method and a MLP Classifier:
Reddit Bot Demo: YouTube Link
The culmination of our project is the deployment of a Reddit Bot, integrating our most effective machine learning model to actively scan and flag posts for suicide ideation on Reddit. This bot aims to bridge the gap between at-risk individuals and timely mental health support.