The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and '1,900' testing samples. The total number of training samples is 120,000 and testing 7,600.
we do preprocessing at data and Calculate the probabilities of N_Grams
finally, we end the project by doing :
- Feature extraction ( apply all 3 algorithms with the classifier and choose the best according to the model's accuracy)
- ML classifier ( apply any ML classifier SVM, NB, DT, RF, etc.) and evaluation metrics ( including model's accuracy, confusion matrix )