Skip to content

marius92mc/document-classification-reuters21578

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Classification Reuters-21578

Classificate documents on topics, using Reuters-21578 data.

Requirements

Please see requirements.txt.
To install these packages, use the following command in a virtualenv.

$ pip install -r requirements.txt

Training data

Based on Reuters-21578 files.
Available in sgm format on

classification/data/ 

Trained data's topics can be found in

classification/data/all-topics-strings.lc.txt

To train and test, run the following from classification/

Train

$ python train_and_classify_reuters_data.py 

Flags

--no-stemming  # don't use stemming when transforming raw data 
# or
--no-stopwords # don't use "remove stopwords" when tranforming data 

Last flag, if mentioned

--svm         # use Support Vector Machine classifier 
--naive-bayes # use Naive-Bayes
--perceptron  # use Perceptron

Learning Methods used

Support Vector Machine
Naive-Bayes
Perceptron

About

Classify documents on topics, using Reuters-21578 dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages