Quote Classification

The project is about determining whether a test string is a motivational quote or not through natural language processing.

In this project, around 1M data strings containing both quotes and reviews or blogs as non-quotes are cleaned, processed further, and then POS (Part of Speech) are generated to create feature vectors. Following the initial feature vectors, various classifiers are trained, and the predictions are assembled together.

Graph

we can understand our data in the following graph where we can look at 20 most frequent word in dataset in different scenario.

Usage

import ensemble.py and featureextaction.py file to get prediction, follow example below

ensemble_clf = ensemble.EnsembleClassifier(ONB_Clf, MNB_Clf, BNB_Clf, LogReg_Clf, SGD_Clf,SVC_clf)

feature_list = [f[0] for f in self.testing_set] ensemble_preds = [ensemble_clf.classify(features) for features in feature_list] return ensemble_preds

Note

Models are trained on 10% of actual data. Link to Quote Dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Quote Classification

The project is about determining whether a test string is a motivational quote or not through natural language processing.

Table of Contents

Introduction

Graph

we can understand our data in the following graph where we can look at 20 most frequent word in dataset in different scenario.

Usage

Note

Files

README.md

Latest commit

History

README.md

File metadata and controls

Quote Classification

The project is about determining whether a test string is a motivational quote or not through natural language processing.

Table of Contents

Introduction

Graph

we can understand our data in the following graph where we can look at 20 most frequent word in dataset in different scenario.

Usage

Note