Date: 12 May 2023
Course: Big Data, Professor Rodriguez
Semester: Spring 2023
In today’s digital world, news headlines play a vital role in shaping public opinion and influencing decision-making processes. The advent of artificial intelligence (AI) and natural language processing (NLP) technologies has revolutionized the way news is created, disseminated, and consumed. ChatGPT, a new “state-of-the-art” language model developed by OpenAI, represents a significant milestone in AI-driven language generation. Launched on November 30th, 2022, ChatGPT has attracted widespread attention and sparked intense discussions across various domains. This final project aims to investigate the sentiment of news headlines related to ChatGPT and how that sentiment has evolved since ChatGPT’s launch. By analyzing the sentiment of these headlines, we can gain valuable insights into public perception and the overall attitude towards surrounding ChatGPT.
This project addresses the sentiment analysis of ChatGPT-related news headlines over time, crucial for understanding public sentiment and its implications for stakeholders. Utilizing the Python Google News API, GNews, we collected and analyzed articles from November 30, 2022, to April 30, 2023, using Kafka and Spark Structured Streaming for data processing.
- Analyze ChatGPT-related news headline sentiment.
- Track sentiment trends post-launch.
- Examine sentiment distribution across different publishers.
- Offer insights and analysis on public sentiment towards ChatGPT.
Sentiment analysis evaluates the emotional tone of text, classifying it as positive, neutral, or negative. This technique provides valuable insights into the expressed opinions, emotions, and attitudes in text data through computational methods.
Our project's data pipeline starts with GNews as the data source, using Kafka for data transportation to Spark, where sentiment analysis is performed. The pipeline integrates various tools, including TextBlob and NLTK, for effective sentiment analysis, with MongoDB, Kafka, and Parquet Files serving as output sinks for the processed data.
This project exemplifies a Big Data endeavor, utilizing advanced tools and methodologies to handle and analyze large datasets efficiently, showcasing the project's alignment with Big Data principles and practices.
- Google Colaboratory: For coding and collaboration.
- GNews API: To fetch relevant articles.
- Apache Kafka & Spark: For data ingestion and processing.
- TextBlob & NLTK: Libraries for sentiment analysis.
- MongoDB: For data storage and analysis.
Visualizations created using MongoDB Atlas reveal the sentiment distribution among ChatGPT-related articles, indicating a predominance of neutral sentiments, followed by positive and negative sentiments.
The analysis highlights stable sentiment ratios over time, despite an increase in article volume, particularly in March and April 2023. Notably, the initial weeks post-launch showed a higher positive sentiment, which stabilized over time.
The most frequent publishers, including Fast Company and Forbes, showed varying sentiment distributions, with Fast Company exhibiting a notably positive sentiment.
Our findings indicate a predominantly neutral sentiment towards ChatGPT in news headlines, with a consistent sentiment distribution over time. This project underscores the capability of Big Data architectures to manage and analyze large datasets efficiently, providing valuable insights into public sentiment on contemporary issues.
- Include references to all sources cited in your analysis, formatted appropriately.
Ali, M. (2023). NLTK Sentiment Analysis Tutorial for Beginners. https://www.datacamp.com/tutorial/text-analytics-beginners-nltk Apache Kafka. (n.d.). Apache Kafka. https://kafka.apache.org/intro Cordon, T. (2022, March 30). Enabling streaming data with Spark Structured Streaming and Kafka. Medium. https://medium.com/data-arena/enabling-streaming-data-with-spark- structured-streaming-and-kafka-93ce91e5b435 Dominguez, H. R. (2022, May 14). Twitter sentiment analysis using Zookeeper, Kafka and PySpark live-streaming on Windows 10 in 2022. Medium. https://medium.com/mcd- unison/twitter-sentiment-analysis-using-zookeeper-kafka-and-pyspark-live-streaming-on- windows-10-in-2022-ada7757097a2 Gongang, L. (2022a, March 12). Apache Spark Structured Streaming - Lorena Gongang - Medium. Medium. https://medium.com/@lorenagongang/apache-spark-structured- streaming-69f06c490d8c Gongang, L. (2022b, March 19). Sentiment analysis on streaming Twitter data using Kafka, Spark Structured Streaming & Python (Part 2). Medium. https://medium.com/@lorenagongang/sentiment-analysis-on-streaming-twitter-data- using-kafka-spark-structured-streaming-python-part-b27aecca697a Gongang, L. (2022c, March 26). Sentiment analysis on streaming Twitter data using Kafka, Spark Structured Streaming & Python (Part 3). Medium. https://medium.com/@lorenagongang/sentiment-analysis-on-streaming-twitter-data- using-kafka-spark-structured-streaming-python-part-eaa9f0af076d