Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 4.47 KB

README.md

File metadata and controls

66 lines (47 loc) · 4.47 KB

hotel-review-text-mining

alt text

Index

  1. Summary
  2. File Directory
  3. Language and Packages Used
  4. Credits
  5. License

Summary

The following project utilizes R to mine sentiment from over 21,000 hotel reviews on resorts located in the Republic of Maldives, a South Asian country located in the Indian Ocean.

File Directory

  1. data - contains the three files used in analysis:
             a. maldives_hotel_reviews.csv - Hotel reviews of resorts in the Republic of Maldives.
             b. negative-lexicon.txt - Negative lexicon used to locate "negative" words.
             c. positive-lexicon.txt - Positive lexicon used to locate "positive" words.

  2. images - contains vizualizations:
             a. body_wordcloud.png - Wordcloud showing commonly occuring words in the review body.
             b. header_wordcloud.PNG - Wordcloud showing commonly occuring words in the review header.
             c. monthly_sentiment.png - Overall sentiment by month for all hotels in the Republic of Maldives.
             d. reviews_by_year.png - Count of reviews by year.
             e. sentiment_comparison.png - Comparision of negative and positive wordcounts.
             f. top_12_hotels.png - Top 12 resorts sentiment comparison.

  3. text_mining.Rmd - R Markdown detailing the text mining process.

  4. text_mining.pdf - PDF that shows R code and the outputted results, for easy viewing.

  5. results.pdf - A full write-up comparing text mining in R vs SAS.

Language and Packages Used

R is used for all model building - the results are compared in R vs SAS.

The following packages are used:

#list of packages used
packages <- c("tm", "wordcloud", "lubridate", "SnowballC", "ggplot2", "dplyr", "tidyr")

#check to see if package is already installed
for(p in packages){
if(!require(p, character.only = TRUE)) {
  install.packages(p)
  library(p, character.only = TRUE)
}
}

Credits

  1. Would like to thank Dr. Mo Saraee from the University of Salford for the maldives_hotel_reviews.csv dataset.
  2. Would like to thank Bing Liu and Minqing Hu for the negative-lexicon.txt and positive-lexicon.txt files, which were taken off of their website.

License

MIT License Copyright (c) 2019 Ian Jeffries