NLP tutorial for the psych methods group at Columbia
i) Clone or download this repository
ii) CD into the repoistory directory and run: pip install -r requirements.txt
iii) For the GloVe vector code, you will need to download this 1gb text file and add it to the repo: http://nlp.stanford.edu/data/glove.6B.zip
There are 5 Jupyter Notebook tutorials that you can run through Anaconda Navigator: https://docs.anaconda.com/anaconda/navigator/getting-started/
Each tutorial has independent code/variables, but the concepts build on each other so I recommend going through them in order.
- Part-of-speech tagging
- Word lemmatization
- Extracting lexical frequencies
- Extracting word phonemes
- Counting syllables
- Rhyme generating with syntactic and syllabic constraints
- Scrape text from wikipedia pages
- Use three different pretrained sentiment analyzers (vader, affin, sentistrength) to compare positive / negative sentiments of different pages
- Use pushshift.io API to scrape massive amounts of text from reddit using criteria such as keywords, date and time, subreddit, upvotes, etc.
- Import a dataset of GloVe vectors trained on Wikipedia text and compare the semantic representations of different words, wikipedia pages, and subreddits.
- Generate your own set of GloVe vectors with a new text corpus.
- Basic code to predict upcoming words using custom text prompts with GPT-2