Sensitive content refers to materials that expose personal or confidential information, which could lead to privacy violations, identity theft, or misuse. This project focuses on detecting and minimizing the influence of sensitive content shared on social media platforms. By utilizing machine learning and deep learning techniques, particularly Convolutional Neural Networks (CNNs), this project aims to automate the detection of such content and reduce its impact through moderation and filtering.
- Sensitive Content Detection: Identifies sensitive images and documents like Aadhaar cards, PAN cards, and driver's licenses.
- Psychological Impact Reduction: Aims to reduce the stress and anxiety caused by accidental exposure of personal information.
- Social Impact: Prevents identity theft and privacy breaches by limiting the exposure of sensitive information.
- Influence Minimization: Reduces the dissemination of sensitive content by using AI to flag and filter inappropriate content.
- Awareness: Raises awareness of the dangers of sharing personal documents on social media and encourages stronger privacy controls.
- Project Overview
- Key Features
- Objectives
- Technologies Used
- Methodology
- Installation
- How to Use
- Challenges
- Contributors
- Detect Sensitive Content: Automatically detect sensitive content like personal documents in images.
- Reduce Psychological and Social Impact: Implement measures to reduce the stress and anxiety caused by such content.
- Prevent Misuse: Identify and flag sensitive documents that may be used for identity theft or online fraud.
- Educate Users: Promote awareness about the dangers of sharing personal documents online and guide them toward safer practices.
- Minimize Influence: Prevent the widespread dissemination of sensitive content through content warnings and filtering systems.
- Python (v3.11.7 or above)
- TensorFlow & Keras for deep learning and CNN implementation
- Google Colab for model training and cloud infrastructure
- Natural Language Processing (NLP) for text analysis and classification
- Convolutional Neural Networks (CNN) for image classification and content filtering
- Anaconda for environment management
- Git for version control
The following steps outline the methodology adopted for this project:
-
Data Collection and Labeling:
- Data is collected from social media platforms and other publicly available sources.
- Experts label and annotate the data to ensure diversity and accuracy.
- Sensitive content includes images of personal documents, images flagged for privacy concerns, etc.
-
Model Training and Testing:
- Convolutional Neural Networks (CNNs) are used to detect sensitive content in images.
- NLP Models are used to classify sensitive text content.
- Both models are trained on labeled data, tested with real-world scenarios, and iterated upon to improve accuracy.
- Model performance is evaluated through precision, recall, and F1-score metrics.
-
Content Moderation:
- Detected sensitive content is flagged for review or automatically filtered based on user-defined settings.
- Users are provided with tools to report inappropriate content and control their privacy settings.
-
Influence Minimization:
- Algorithms are adjusted to filter out sensitive material from users' feeds.
- Customization options, such as content filters and warnings, are available to users to control what they see.
- Clone the repository:
git clone https://github.com/SubhoHazra07/Sensitive-Content-Detection-and-Minimization-of-Their-Influence.git
- Navigate to the project directory:
cd Sensitive Content Analysis
- Create a virtual environment and activate it:
conda create --name content-detection python=3.11.7 conda activate content-detection
- Install the required dependencies:
pip install -r requirements.txt
- Access Dataset and Pre-trained Models: You can download the dataset and pre-trained models required for running this project from the following Google Drive link.
- After downloading, place the dataset and model files in the respective
/data
and/models
directories within the project.
- Data Upload: Upload images or text to be analyzed for sensitive content.
- Run Models: Execute the image and text classification scripts to get predictions on whether the content is sensitive.
python src/image_classification.ipynb python src/text_classification.ipynb
- Customize Content Warnings: Adjust content filters and privacy settings to manage the display of sensitive content.
- False Positives/Negatives: Achieving the perfect balance between flagging sensitive content and allowing legitimate content through remains a challenge.
- Scalability: Processing large volumes of user-generated content in real-time requires significant computational resources.
- Cultural Sensitivity: Different regions may interpret content differently, leading to misclassifications.
- Subho Hazra (GitHub: SubhoHazra07)
- Shweta Das (GitHub: Shweta-Das01)