Skip to content

A class project for CS585: Introduction to Natural Language Processing. A Naive Bayes classifier for sentiment analysis of movie reviews.

License

Notifications You must be signed in to change notification settings

morsecodist/CS_585_Naive_Bayes_Sentiment_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Naieve Bayes Bag-of-Words Sentiment Classifier

Description

Trains a naieve bayes classifier to predict sentiment of a movie review (positive or negative). The assignment code has been cleaned up and streamlined to facilitate reading and usage. This means the complete solution to the assignment is not here, just what I deemed the most relevant part for sharing.

Instructor Implementations

  • tokenize_doc
  • train
  • report_statistics_after_training

Modifications to Instructor Implementations

  • __init__: Added feature_extractor member that defaults to tokenize_doc
  • tokenize_and_update_model: Switched to use feature_extractor member rather than tokenize_doc

Implementations I provided

  • tokenize_doc_stopwords
  • tokenize_doc_stopwords_custom
  • tokenize_doc_stopwords_and_stemming
  • update_model
  • p_word_given_label
  • log_likelihood
  • p_word_given_label_and_psuedocount
  • log_likelihood
  • log_prior
  • unnormalized_log_posterior
  • classify
  • likelihood_ratio
  • evaluate_classifier_accuracy

Demo

To train a Naive Bayes classifier on the large_movie_review_dataset data using a feature extractor that stems, removes stopwords, and custom stopwords:

python nb_sentiment_classify.py

This command trains the model with every pseudocount from 1 to 25 (inclusive), creates a graph of pseudocount vs accuracy, returns the best pseudocount and the accuracy associated with that pseudocount.

Usage

from nb_sentiment_classify import NaiveBayes;

# Initialize model with default feature extractor
nb = NaiveBayes()

# Train model on large_movie_review_dataset
nb.train_model()

# Evaluate accuracy given a pseudocount (1 used in this example)
nb.evaluate_classifier_accuracy(1)

About

A class project for CS585: Introduction to Natural Language Processing. A Naive Bayes classifier for sentiment analysis of movie reviews.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages