📊 Overview

A comprehensive Python application that performs real-time sentiment analysis on news headlines, storing the results in a SQLite database and generating interactive visualizations. The system employs multiple sentiment analysis models, including VADER, FinBERT, and RoBERTa, to provide nuanced sentiment scoring.

🌟 Key Features

Multi-Model Sentiment Analysis: Combines VADER, FinBERT, and RoBERTa models for robust sentiment scoring
Real-time RSS Feed Processing: Automatically fetches and analyzes news headlines
Interactive Visualizations: Comprehensive dashboards using Plotly
Efficient Data Storage: SQLite database with optimized indexing
Duplicate Detection: Intelligent similarity-based duplicate removal
Comprehensive Analysis: Including timeline views, sentiment distributions, and statistical breakdowns

Custom Configuration

from news_analysis import DatabaseManager, SentimentAnalyzer, DataVisualizer

# Initialize components
db = DatabaseManager('custom_database.db')
analyzer = SentimentAnalyzer()
visualizer = DataVisualizer()

# Run specific analyses
visualizer.create_visualizations('custom_database.db')

📊 Visualization Types

Main Dashboard

Daily Entry Counts
Hourly Distribution
Sentiment Timeline
Summary Length Distribution
Sentiment Distribution
Weekly Patterns
Sentiment Moving Average
Headline Length vs Sentiment
Time of Day Sentiment

Headlines Analysis

Recent Headlines Table
Most Positive Headlines
Most Negative Headlines
Statistical Summaries

🗄️ Database Schema

sentiment_scores Table

CREATE TABLE sentiment_scores (
    date TEXT,
    time TEXT,
    title TEXT,
    summary TEXT,
    score REAL
)

Indexes

idx_date: Optimizes date-based queries
idx_title: Facilitates headline searches
idx_score: Improves sentiment-based filtering

🔍 Duplicate Detection

The project includes functions to:

Eliminate duplicate or near-duplicate entries based on a similarity threshold.
Provide analysis and cleanup of the dataset for better performance and accuracy.

Configuration

# Adjust similarity threshold (default: 0.85)
remove_duplicates(db_path='news_sentiment.db', similarity_threshold=0.90)

📈 Performance Optimization

Database Optimization

Write-Ahead Logging (WAL) mode
Optimized cache settings
Efficient indexing strategy
Regular VACUUM operations

Processing Optimization

Thread pooling for parallel sentiment analysis
LRU caching for frequently accessed data
Batch processing capabilities
GPU acceleration when available

📝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Guidelines

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

NLTK team for VADER sentiment analysis
Hugging Face for transformer models
Plotly team for visualization capabilities
Contributors and maintainers of all dependent libraries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

📊 Overview

🌟 Key Features

Custom Configuration

📊 Visualization Types

Main Dashboard

Headlines Analysis

🗄️ Database Schema

sentiment_scores Table

Indexes

🔍 Duplicate Detection

Configuration

📈 Performance Optimization

Database Optimization

Processing Optimization

📝 Contributing

Guidelines

📄 License

🙏 Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

📊 Overview

🌟 Key Features

Custom Configuration

📊 Visualization Types

Main Dashboard

Headlines Analysis

🗄️ Database Schema

sentiment_scores Table

Indexes

🔍 Duplicate Detection

Configuration

📈 Performance Optimization

Database Optimization

Processing Optimization

📝 Contributing

Guidelines

📄 License

🙏 Acknowledgments