The following project utilizes R to mine sentiment from over 21,000 hotel reviews on resorts located in the Republic of Maldives, a South Asian country located in the Indian Ocean.
-
data - contains the three files used in analysis:
a. maldives_hotel_reviews.csv - Hotel reviews of resorts in the Republic of Maldives.
b. negative-lexicon.txt - Negative lexicon used to locate "negative" words.
c. positive-lexicon.txt - Positive lexicon used to locate "positive" words. -
images - contains vizualizations:
a. body_wordcloud.png - Wordcloud showing commonly occuring words in the review body.
b. header_wordcloud.PNG - Wordcloud showing commonly occuring words in the review header.
c. monthly_sentiment.png - Overall sentiment by month for all hotels in the Republic of Maldives.
d. reviews_by_year.png - Count of reviews by year.
e. sentiment_comparison.png - Comparision of negative and positive wordcounts.
f. top_12_hotels.png - Top 12 resorts sentiment comparison. -
text_mining.Rmd - R Markdown detailing the text mining process.
-
text_mining.pdf - PDF that shows R code and the outputted results, for easy viewing.
-
results.pdf - A full write-up comparing text mining in R vs SAS.
R is used for all model building - the results are compared in R vs SAS.
The following packages are used:
#list of packages used
packages <- c("tm", "wordcloud", "lubridate", "SnowballC", "ggplot2", "dplyr", "tidyr")
#check to see if package is already installed
for(p in packages){
if(!require(p, character.only = TRUE)) {
install.packages(p)
library(p, character.only = TRUE)
}
}
- Would like to thank Dr. Mo Saraee from the University of Salford for the maldives_hotel_reviews.csv dataset.
- Would like to thank Bing Liu and Minqing Hu for the negative-lexicon.txt and positive-lexicon.txt files, which were taken off of their website.
MIT License Copyright (c) 2019 Ian Jeffries