NLP-tutorial_psych-methods

NLP tutorial for the psych methods group at Columbia

Installation

i) Clone or download this repository

ii) CD into the repoistory directory and run: pip install -r requirements.txt

iii) For the GloVe vector code, you will need to download this 1gb text file and add it to the repo: http://nlp.stanford.edu/data/glove.6B.zip

How to run

There are 5 Jupyter Notebook tutorials that you can run through Anaconda Navigator: https://docs.anaconda.com/anaconda/navigator/getting-started/

Each tutorial has independent code/variables, but the concepts build on each other so I recommend going through them in order.

Outline

1. NLTK (natural language toolkit) Basics

Part-of-speech tagging
Word lemmatization
Extracting lexical frequencies
Extracting word phonemes
Counting syllables
Rhyme generating with syntactic and syllabic constraints

2. Wikipedia text scraping and sentiment analyses

Scrape text from wikipedia pages
Use three different pretrained sentiment analyzers (vader, affin, sentistrength) to compare positive / negative sentiments of different pages

3. Reddit scraping

Use pushshift.io API to scrape massive amounts of text from reddit using criteria such as keywords, date and time, subreddit, upvotes, etc.

4. GloVe vectors (semantic word embeddings)

Import a dataset of GloVe vectors trained on Wikipedia text and compare the semantic representations of different words, wikipedia pages, and subreddits.
Generate your own set of GloVe vectors with a new text corpus.

5. Generating text with the GPT-2 transformer model

Basic code to predict upcoming words using custom text prompts with GPT-2

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
SentiStrength_Data		SentiStrength_Data
1_NLTK_basics.ipynb		1_NLTK_basics.ipynb
2_wiki-scraping_sentiment-analysis.ipynb		2_wiki-scraping_sentiment-analysis.ipynb
3_reddit-scraping.ipynb		3_reddit-scraping.ipynb
4_glove-vectors.ipynb		4_glove-vectors.ipynb
5_GPT-2_text-generation.ipynb		5_GPT-2_text-generation.ipynb
README.md		README.md
SUBTLEX.xlsx		SUBTLEX.xlsx
SentiStrength.jar		SentiStrength.jar
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-tutorial_psych-methods

Installation

How to run

Outline

1. NLTK (natural language toolkit) Basics

2. Wikipedia text scraping and sentiment analyses

3. Reddit scraping

4. GloVe vectors (semantic word embeddings)

5. Generating text with the GPT-2 transformer model

About

Releases

Packages

Languages

msieg/NLP-tutorial_psych-methods

Folders and files

Latest commit

History

Repository files navigation

NLP-tutorial_psych-methods

Installation

How to run

Outline

1. NLTK (natural language toolkit) Basics

2. Wikipedia text scraping and sentiment analyses

3. Reddit scraping

4. GloVe vectors (semantic word embeddings)

5. Generating text with the GPT-2 transformer model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages