Skip to content

AlaaElhariry/NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NLP

About the dataset

The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and '1,900' testing samples. The total number of training samples is 120,000 and testing 7,600.

task1

we do preprocessing at data and Calculate the probabilities of N_Grams

task2

finally, we end the project by doing :

  • Feature extraction ( apply all 3 algorithms with the classifier and choose the best according to the model's accuracy)
  • ML classifier ( apply any ML classifier SVM, NB, DT, RF, etc.) and evaluation metrics ( including model's accuracy, confusion matrix )

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published