Text analysis, digital reading, computation.
Ted Underwood, paceofchange (2015). (GitHub repository sharing code to reproduce analysis reported in his article "How Quickly Do Literary Standards Change?". The article explicitly explains the research process, from collecting/selecting data to analysis. Creating a supervised classification machine learning model not to predict, but to interrogate the model.)
Thinking with visualization vs. communicating with visualizations: analytics < - > communication
New ways of readings: statistics, visualizations, NLP
Visualize text(s):
- Juxta (visual collation, example classroom assignments, Frankenstein editions assignment)
- Jason Davies WordTree (visual concordance, D3 web app, academic explanation)
- Voyant (suite of viz tools, Help, Voyant Docs, and VoyantServer for offline use. Explore Adam's writing: forest, public)
- OverviewDocs (tool for exploring huge groups of documents designed for journalists, e.g. Wikileaks email dump)
- AntConc (lots of software and publications from Laurence Anthony)
- Poemage ("a visualization system for exploring the sonic topology of a poem")
Text Big Data:
Think about Big Data 3 V’s (volume, variety and velocity). Advances are driven by business seeking to process unstructured text data (the web / social media) to extract value.
- Hathi Trust Research Center Portal (big text data)
- N-Grams: Bookworm or Google Books Ngram Viewer (see TED talk)
Natural Language Processing / Machine Learning:
- Typically classification tasks: Entity recognition, POS tagging, topic modeling, sentiment, summarization.
- Supervised vs. Unsupervised classification
- Python NLTK (plus simpler TextBlob, and more powerful scikit-learn)
- MALLET topic modeling
- Stanford NLP Group (a library of Java apps, e.g. named entity tagging demo, with academic papers explaining their use)
- Open Calais (NLP API trained on web and newspaper text)
- Watson Natural Language Understanding (NLP API trained on web content)
Programming as inquiry
OpenRefine Sonnets project
See Fetch and Parse Data with OpenRefine
OpenRefine sentiment analysis project
Visualize the numbers with Rawgraphs
Distant versus close reading. Digital editing and annotating, versus computation.
Gideon Lewis-Kraus, The Great A.I. Awakening, NYTimes Magazine, December 2016.
Ted Underwood, "Seven Ways Humanists are Using Computers to Understand Text" (2015). (intro overview of types of computational analysis)
Ted Underwood, paceofchange (2015). (GitHub repository sharing code to reproduce analysis reported in his article "How Quickly Do Literary Standards Change?". The article explicitly explains the research process, from collecting/selecting data to analysis.)
Ted Underwood, "The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us", in New Literary History (2014).
Jeffrey M. Binder, "Alien Reading: Text Mining, Language Standardization, and the Humanities" in Debates in the Digital Humanities (2016).
Stanford Literary Lab Pamphlets. (ongoing series of publications relating to "computational criticism")
Sunspring (a film script written by AI)
Periscopic data ("socially-conscious data visualization")
Tools directories:
Textbooks:
- Brandon Walsh and Sarah Horowitz, Introduction to Text Analysis: A Coursebook (2016). (Open textbook, Jekyll project hosted on gh-pages, repo)
- Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. (known as the NLTK Book, good intro to using Python and textual analysis concepts, with large sample corpora)
- Stéfan Sinclair & Geoffrey Rockwell, The Art of Literary Text Analysis (2016). (a Python Jupyter notebook based open text)
- Matthew Jockers, Text Analysis with R for Students of Literature (2014).
- Julia Silge and David Robinson, Tidy Text Mining in R (2017). (Bookdown project written in RMarkdown)
- Nick Montfort, Exploratory Programming for the Arts and Humanities (MIT Press, 2016).
- For more see Scott B. Weingart, Teaching Yourself to Code in DH.
Syllabus / assignments:
- Annie Swafford, Sherlock Holmes topic modeling assignment (2015).
- Lincoln Mullen, Text Analysis for Historians (2016).
- Beth Platte, Text analysis using Voyant Tools (2017).
- Pedagogy-Toolkit Voyant Tools assignments.
- Max Kemman, A-Republic-of-Emails (2016). (GitHub repo assignment to analyze wikileaks email dump)
- DataBasic. (slick, simple web apps with lessons to introduce text data concepts)
- Programming Historian tutorials.
- Evan's text analysis and data viz notes.
Data:
- Exploring Big Historical Data: The Historian’s Macroscope, data downloads
- DH Toychest: Data Collections and Datasets