Build and visualize Word2Vec model with Gensim

This code belongs to the "Build and Visualize Word2Vec Model on Amazon Reviews" blog post.

Word2vec is a very popular Natural Language Processing technique nowadays that uses a neural network to learn the vector representations of words called "word embeddings" in a particular text.

In this tutorial, we will use the excellent implementation of word2vec from the gensim package to build our word2vec model. We will use t-Distributed Stochastic Neighbor Embedding (t-SNE) in sklearn to visualize the learned embeddings vectors.

Requirements

Python 2.7
Jupyter Notebook
gensim
numpy
pandas
natural language toolkit (nltk)

Dataset

We will use the Amazon review corpus on Health and Personal Care. The dataset is in json format and contains 346,355 reviews.

Model visualization

we visualize the learned embeddings using t-SNE. t-SNE is a tool for data visualization that reduces the dimensionality of data to 2 or 3 dimensions so that it can be plotted easily.

Most similar words

One way to check if we have a good word2vec model is to use the model to find the most similar words to a specific word. For that, we can use the most_similar function that returns the 10 most similar words to the given word. Let's find the most similar words to the word blue.

References

How to Make Word Vectors from Game of Thrones (LIVE)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
img		img
README.md		README.md
word2vec_model_on_avazon_review.ipynb		word2vec_model_on_avazon_review.ipynb
word2vec_model_trained_on_Health_and_Personal_Care_5.w2v		word2vec_model_trained_on_Health_and_Personal_Care_5.w2v
word2vec_model_visualization.ipynb		word2vec_model_visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build and visualize Word2Vec model with Gensim

Requirements

Dataset

Model visualization

Most similar words

References

About

Releases

Packages

Languages

MiguelSteph/word2vec-with-gensim

Folders and files

Latest commit

History

Repository files navigation

Build and visualize Word2Vec model with Gensim

Requirements

Dataset

Model visualization

Most similar words

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages