Sensitive Content Detection and Minimization of Their Influence

Project Overview

Sensitive content refers to materials that expose personal or confidential information, which could lead to privacy violations, identity theft, or misuse. This project focuses on detecting and minimizing the influence of sensitive content shared on social media platforms. By utilizing machine learning and deep learning techniques, particularly Convolutional Neural Networks (CNNs), this project aims to automate the detection of such content and reduce its impact through moderation and filtering.

Key Features

Sensitive Content Detection: Identifies sensitive images and documents like Aadhaar cards, PAN cards, and driver's licenses.
Psychological Impact Reduction: Aims to reduce the stress and anxiety caused by accidental exposure of personal information.
Social Impact: Prevents identity theft and privacy breaches by limiting the exposure of sensitive information.
Influence Minimization: Reduces the dissemination of sensitive content by using AI to flag and filter inappropriate content.
Awareness: Raises awareness of the dangers of sharing personal documents on social media and encourages stronger privacy controls.

Objectives

Detect Sensitive Content: Automatically detect sensitive content like personal documents in images.
Reduce Psychological and Social Impact: Implement measures to reduce the stress and anxiety caused by such content.
Prevent Misuse: Identify and flag sensitive documents that may be used for identity theft or online fraud.
Educate Users: Promote awareness about the dangers of sharing personal documents online and guide them toward safer practices.
Minimize Influence: Prevent the widespread dissemination of sensitive content through content warnings and filtering systems.

Technologies Used

Python (v3.11.7 or above)
TensorFlow & Keras for deep learning and CNN implementation
Google Colab for model training and cloud infrastructure
Natural Language Processing (NLP) for text analysis and classification
Convolutional Neural Networks (CNN) for image classification and content filtering
Anaconda for environment management
Git for version control

Methodology

The following steps outline the methodology adopted for this project:

Data Collection and Labeling:
- Data is collected from social media platforms and other publicly available sources.
- Experts label and annotate the data to ensure diversity and accuracy.
- Sensitive content includes images of personal documents, images flagged for privacy concerns, etc.
Model Training and Testing:
- Convolutional Neural Networks (CNNs) are used to detect sensitive content in images.
- NLP Models are used to classify sensitive text content.
- Both models are trained on labeled data, tested with real-world scenarios, and iterated upon to improve accuracy.
- Model performance is evaluated through precision, recall, and F1-score metrics.

Content Moderation:
- Detected sensitive content is flagged for review or automatically filtered based on user-defined settings.
- Users are provided with tools to report inappropriate content and control their privacy settings.
Influence Minimization:
- Algorithms are adjusted to filter out sensitive material from users' feeds.
- Customization options, such as content filters and warnings, are available to users to control what they see.

Installation

Clone the repository:

git clone https://github.com/SubhoHazra07/Sensitive-Content-Detection-and-Minimization-of-Their-Influence.git

Navigate to the project directory:
```
cd Sensitive Content Analysis
```

Create a virtual environment and activate it:

conda create --name content-detection python=3.11.7
conda activate content-detection

Install the required dependencies:
```
pip install -r requirements.txt
```

Dataset and Model

Access Dataset and Pre-trained Models: You can download the dataset and pre-trained models required for running this project from the following Google Drive link.
After downloading, place the dataset and model files in the respective /data and /models directories within the project.

How to Use

Data Upload: Upload images or text to be analyzed for sensitive content.
Run Models: Execute the image and text classification scripts to get predictions on whether the content is sensitive.
```
python src/image_classification.ipynb
python src/text_classification.ipynb
```
Customize Content Warnings: Adjust content filters and privacy settings to manage the display of sensitive content.

Challenges

False Positives/Negatives: Achieving the perfect balance between flagging sensitive content and allowing legitimate content through remains a challenge.
Scalability: Processing large volumes of user-generated content in real-time requires significant computational resources.
Cultural Sensitivity: Different regions may interpret content differently, leading to misclassifications.

Contributors

Subho Hazra (GitHub: SubhoHazra07)
Shweta Das (GitHub: Shweta-Das01)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Sensitive Content Analysis		Sensitive Content Analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sensitive Content Detection and Minimization of Their Influence

Project Overview

Key Features

Table of Contents

Objectives

Technologies Used

Methodology

Installation

Dataset and Model

How to Use

Challenges

Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SubhoHazra07/Sensitive-Content-Detection-and-Minimization-of-Their-Influence

Folders and files

Latest commit

History

Repository files navigation

Sensitive Content Detection and Minimization of Their Influence

Project Overview

Key Features

Table of Contents

Objectives

Technologies Used

Methodology

Installation

Dataset and Model

How to Use

Challenges

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages