Skip to content

Latest commit

 

History

History
30 lines (16 loc) · 3.02 KB

README.md

File metadata and controls

30 lines (16 loc) · 3.02 KB

NLP Project

Summary

One of the main challenges of the digital era is the difficulty of assessing the reliability of information sources. This project compares two machine learning models (Logistic Regression and Passive-agressive) to detect fake news in two languages: English and Spanish. The data was obtained from the "ISOT Fake News" database, made available by the University of Victoria, Canada, for the english news, and by web scraping data from news websites for the Spanish news. The results in both languages show a lower level of logistic regression performance compared to the passive aggressive classifier. The metrics used to measure performance are Accuracy and F1 Score, in addition to the confusion matrix and cross-validation to ensure the absence of bias during the training and adjustment phase.

Proposed Method

Screenshot 2023-03-30 165840

Experimental details:

For this project, two experiments have been carried out. The first one analyses the database with news in English and compares the results of the evaluation metrics on the title and the text of the news. Furthermore, the dataset was divided into training data (70%) and test data (30%). The selected models were set with default parameters. For the second experiment, a new database with news in Spanish was selected to evaluate the quality of the models, Logistic Regression and Passive Aggressive, in detecting fake news using the text of the news, and compare it with the results obtained in the first experiment. In this case, the training data is 80% and the test data is 20%. First, the selected classifiers were set with the default parameters, and then a hyper-parameter tuning was performed for the model that obtained the lowest result (Logistic Regression).

Results Experiment 1:

The results related to this experiment can be divided into two levels of evaluation: data analysis time and classification quality in predicting fake news.

Screenshot 2023-03-31 111042

Results Experiment 2:

Table 3 provides an overview of the quantitative performance of the models in predicting fake news in Spanish. In this experiment, the passive-aggressive algorithm outperformed the logistic regression model with a F1-score of 92% and an Accuracy rate of 95%.

Screenshot 2023-03-31 111316

Furthermore, table 4 shows a comparison of the F1 Score between the articles in English (first experiment) and the news in Spanish. The results are significantly better when implementing the models on the English database.

Screenshot 2023-03-31 111421