understanding-nlp-classification

Introduction

This repository contains the code for understanding NLP classification. This code is written just for educational purposes. The Kaggle dataset used in this code is here, and is located under data directory. The dataset contains 10,000 books with their title, author, and description. The goal is to predict the genre of the book based on the description.

First I explored the dataset, and made classes for preprocessing the summaries of each book. Then to extract the features of the dataset I used Word2Vec and the pretrained google-news-300-Word2Vec from gensim for broader contextual understanding of the kaggle data. Then I used different classification algorithms from scikit-learn then PyTorch to predict the genre of the book.

This is an ongoing project, and I will be updating the code as I learn more about NLP classification.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
demos		demos
pkg		pkg
.gitignore		.gitignore
README.md		README.md
Untitled-1.ipynb		Untitled-1.ipynb
books-torch.ipynb		books-torch.ipynb
books.ipynb		books.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

understanding-nlp-classification

Introduction

About

Releases

Packages

Languages

adamkurth/understand-nlp-classification

Folders and files

Latest commit

History

Repository files navigation

understanding-nlp-classification

Introduction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages