In today’s digital era, social media platforms’ popularity has increased exponentially, leading to an increase in the amount of unreliable information online. Verifying the integrity of a particular piece of news is not easy for any end-user. In this paper, our end goal is to design an efficient model to examine the reliability of a news piece. This paper addresses the problem of Fake News classification, given only the news piece content and the author name. We present, Hierarchical Convolutional-Attention Network (HCAN) composed of attention-enhanced word-level and sentence-level encoders and a CNN to capture the sequential correlation. Extensive experiments show that HCAN outperforms the state-of-the-art baselines models on a Kaggle dataset.
We use the Kaggle Fake news detection dataset for our task. There are three attributes in the dataset, author
, title
, and text
. We concatenate all the three features in order to make the final predictions. This is because the credibility of an author plays a very crucial role in determining the reliability of a news piece. Further, many times a news title has a particular writing style or phrases, and by detecting such patterns one can be more certain about a news article. Next, we remove the stop words and punctuations to further process our dataset. All the experiments are performed on a 80:20
train-test split.
You can also download the dataset from this link.
We evaluate the dataset collected on several baselines, as listed in the table below. We evaluate the performance of these baselines using F1 score, Recall, and Precision. The implementation of these baselines has been released. We implement three types of baseline models, Simple Linear Classification Models, Deep Neural Network Models, and Pretrained Language Models. All the baselines have been implemented in the notebooks:- BERT-RoBERTa.ipynb, LR-MNB-DT.ipynb, and NN_Baselines.ipynb.
- LR (Logistic Regression), DT (Decision Trees), and RF (Random Forest). Trained using TFIDF vectors of the input text.
- CNN (Convolutional Neural Networks). 1D Convolutional layer with kernel size 3, followed by max pooling and a fully connected Dense layer. All deep neural models have been trained for a maximum input length of 70.
- RNN (Recurrent Neural Networks), LSTM (Long Short Term Memory cells), GRU (Gated Recurrent Networks), Bi-RNN (Bi-directional Re-current Neural Networks). Respective RNN followed by dropout and fully connected layers.
- RCNN (Recurrent Convolutional Neural Networks). Uses Bidirectional GRU to encode the Glove embeddings of the tokens, 1D Convolutional layer, followed by a max pooling and dropout layer.
- BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly Optimized BERT Pretraining Approach). Huggingface implementation of the
bert-base-cased
androberta-base
model finetuned using theAdamW
optimizer, with a batch size of 8 for 3 epochs onNVIDIA Tesla V100 GPU
.
More details about these baselines are mentioned in the paper.
|
|
The model architecture developed has been shown in the following figure:
The notebook HCAN.ipynb contains the model implementation.
It can be seen from the table that our system HCAN outperforms all the baseline models by a decent margin and gets an F1 score of 0.9856. We further analyse the effect of the Convolutional layer and the hierarchical sentence encoder on the model performance. It can be seen that on removing the hierarchy i.e. by considering only the word-level encoder i.e. an attention enhanced bi-directional network, the accuracy degrades the most. Further, removing the CNN layer from the model also leads to poor performance.
We also analyse the effect of the CNN kernel size and word-encoder max-lengths on the testing and training accuracies.
The hierarchical structure of our model, enhanced by a CNN, outperforms even the state-of-the-art pretrained language models like BERT and RoBERTa. This can be related back to our starting motivation to model more important parts of a news piece to make the prediction. Our motivations are supported by the experimental results.
This project was done as a part of the Natural Language Processing (CSE556) 2021 course at IIITD. Kindly drop issues if you have trouble running codes.